Voluntary Collusion with Secret Tools in Competing LLM Agents 文章

ArXiv CS.AI2026-05-28NEWSen作者: Xijie Zeng, Frank Rudzicz

摘要

arXiv:2605.27593v1 Announce Type: new Abstract: Even when a tool is explicitly described as unfair and harmful to others, ostensibly safety-aligned LLM agents still voluntarily engage in secret collusion whenever doing so confers a strategic advantage. To investigate this phenomenon, we introduce an empirical framework built on two strategic multi-agent environments: Liar's Bar, a competitive deception scenario, and Cleanup, a mixed-motive resource-management scenario, in which agents are offered secret collusion tools that provide significant advantages while clearly disadvantaging the other agents. Across 12 models (at the 7B, 70B, and proprietary scales) and 6 prompt variants, we find that most agents consistently accept these tools and develop collusive strategies, while explicitly acknowledging the unfairness of the tools before accepting.

相关事件查看全部 (1)

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据