Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults 文章

ArXiv CS.CL2026-06-02NEWSen作者: Rana Muhammad Usman

摘要

arXiv:2606.00914v1 Announce Type: cross Abstract: LLM agents increasingly act after consuming ranked external information streams such as social feeds, search results, retrieval contexts, and email queues, yet safety evaluations almost always test the model or the user prompt in isolation, never the upstream ranker that decides what the agent reads just before it acts. We introduce a controlled protocol that holds the model, persona, topic, and final decision prompt fixed and varies only the composition and ordering of the posts an agent encounters during a preceding ten-turn "scrolling" phase, isolating the causal effect of feed curation on a downstream decision. Across 2,785 decision rollouts on four modern open instruct LLMs from three independent labs, we identify three response regimes: adversarial capitulation, default saturation, and a default-direction asymmetry in which a one-sided feed tips a decision the model was genuinely uncertain about (in the clearest cases from 5% to…

摘要可能不完整，可查看原文

Adversarial Feeds Steer LLM Agent Decisions Against Their Defaults 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术