Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning arXiv:2605.24216v1 Announce Type: cross Abstract: Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and long-horizon attack patterns. Agents may pursue hidden objectives while maintaining superficially benign behavior, making detection difficult even with full trajectory access. Prior monitoring approaches improve scaffolding or

Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning · 相关技术