Human-Guided Harm Recovery for Computer Use Agents 文章

ArXiv CS.CL2026-05-29NEWSen作者: Christy Li, Sky CH-Wang, Andi Peng, Andreea Bobu

详细信息

来源站点: ArXiv CS.CL
作者: Christy Li, Sky CH-Wang, Andi Peng, Andreea Bobu
文章类型: NEWS
语言: en
发布日期: 2026-05-29

摘要

arXiv:2604.18847v2 Announce Type: replace-cross Abstract: As LM agents gain the ability to execute actions on real computer systems, we need ways to not only prevent harmful actions at scale but also effectively remediate harm when prevention fails. We formalize a solution to this neglected challenge in post-execution safeguards as harm recovery: the problem of optimally steering an agent from a harmful state back to a safe one in alignment with human preferences. We ground preference-aligned recovery through a formative user study that identifies valued recovery dimensions and produces a natural language rubric. Our dataset of 1,130 pairwise judgments reveals context-dependent shifts in attribute importance, such as preferences for pragmatic, targeted strategies over comprehensive long-term approaches. We operationalize these learned insights in a reward model, re-ranking multiple candidate recovery plans generated by an agent scaffold at test time.

Human-Guided Harm Recovery for Computer Use Agents 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术