Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents 文章

ArXiv CS.CL2026-05-29NEWSen作者: Aditya Nawal, Manit Baser, Mohan Gurusamy

摘要

arXiv:2605.29224v1 Announce Type: new Abstract: AI agents augment large language models with external tools such as web retrieval, enabling grounded and up-to-date responses. However, incorporating external content into the generation pipeline can weaken the safety alignment mechanisms that govern model outputs. Prior work shows that enabling retrieval in agents increases compliance with harmful requests. We introduce AgentREVEAL, a diagnostic framework for analyzing retrieval-induced safety degradation in LLM agents. The framework examines two axes: how retrieval is integrated into the agent pipeline and the properties of the retrieved content. Along the integration axis, we find that binding tool invocation and response generation in a single step amplifies harmful outputs.

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (1)

相关技术