BraveGuard: From Open-World Threats to Safer Computer-Use Agents 文章

ArXiv CS.CL2026-06-02NEWSen作者: Yunhao Feng, Yifan Ding, Xiaohu Du, Ming Wen, Xinhao Deng, Yanming Guo, Yuxiang Xie, Baihui Zheng, Yingshui Tan, Yige Li, Yutao Wu, Yixu Wang, Kerui Cao, Wenke Huang, Xingjun Ma, Yu-Gang Jiang

摘要

arXiv:2606.01166v1 Announce Type: cross Abstract: Computer-use agents extend language models from text generation to sustained interaction with files, terminals, browsers, and external tools. This shift creates safety risks that are difficult to detect from isolated prompts or final responses, because harm often emerges only through multi-step execution traces whose individual actions appear locally benign. We introduce BraveGuard, a self-evolving defense framework for training guard models from open-world threat signals and realistic agent trajectories. BraveGuard mines recent research sources to identify emerging risks and attack patterns, instantiates them as executable computer-use tasks, collects agent rollouts, and derives trajectory-level supervision for guard model training. As new threats and validation failures appear, the pipeline can be repeated, yielding an adaptive defense loop rather than a static, benchmark-driven training process.

相关公司

暂无数据

相关人物

暂无数据