Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use arXiv:2603.03205v2 Announce Type: replace Abstract: Agentic language models operate in a fundamentally different safety regime than chat models: they must plan, call tools, and execute long-horizon actions where a single misstep, such as accessing files or entering credentials, can cause irreversible harm. Existing alignment methods, largely optimized for static generation and task completion, break d

Learning When to Act or Refuse: Guarding Agentic Reasoning Models for Safe Multi-Step Tool Use · 相关报道