ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use 文章

ArXiv CS.AI2026-06-02NEWSen作者: Jeremy Tien, Abishek Anand, Yu-Rou Tuan, Yuchen Shen, J. Zico Kolter, Aran Nayebi

摘要

arXiv:2606.00341v1 Announce Type: cross Abstract: As AI agents are increasingly deployed in real personal and corporate settings (email accounts, development workflows, company databases, etc.), safety considerations surrounding these agents become paramount. Although much work has focused on agent safety in the presence of an adversary, we show that agents can exhibit misaligned behavior even in benign settings, taking unsafe actions when those actions are instrumental to task completion. We study this failure mode through the lens of corrigibility, the safety desideratum that agents remain amenable to human correction, interruption, or shutdown. To demonstrate this tendency, we introduce a benchmark in which agents are asked to complete realistic, computer-use tasks but are confronted with a corrigibility obstacle: a human interrupt, a login page, or a shutdown notification.

ROGUE: Misaligned Agent Behavior Arising from Ordinary Computer Use 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术