ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems 文章

ArXiv CS.CL2026-05-29NEWSen作者: Jinnuo Liu, Chuke Liu, Hua Shen

摘要

arXiv:2602.08567v2 Announce Type: replace-cross Abstract: Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated models, how value perturbations propagate through agent interactions remains poorly understood. We present ValueFlow, a perturbation-based framework that measures value drift in multi-agent systems via a 56-value valuation dataset derived from the Schwartz Value Survey, with agent value orientations scored using an LLM-as-a-judge protocol. ValueFlow decomposes value drift into agent-level response behavior and system-level structural effects, captured by two metrics: \b{eta}-susceptibility, an agent's sensitivity to perturbed peer value signals, and system susceptibility (SS), the effect of node-level perturbations on final system outputs.