Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents 事件

REGULATION2026-05-29影响: MEDIUM

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents arXiv:2605.29224v1 Announce Type: new Abstract: AI agents augment large language models with external tools such as web retrieval, enabling grounded and up-to-date responses. However, incorporating external content into the generation pipeline can weaken the safety alignment mechanisms that govern model outputs. Prior work shows that enabling retrieval in agents increases compliance with harmful requests. We

Relevance as a Vulnerability: How Web Retrieval Degrades Safety Alignment in LLM Agents · 相关技术