Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning arXiv:2605.24216v1 Announce Type: cross Abstract: Monitoring autonomous large language model (LLM) agents for covert malicious behavior is challenging due to delayed, context-dependent, and long-horizon attack patterns. Agents may pursue hidden objectives while maintaining superficially benign behavior, making detection difficult even with full trajectory access. Prior monitoring approaches improve scaffolding or
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning · 相关报道
相关报道
Agent-ToM: Learning to Monitor Autonomous LLM Agents via Theory-of-Mind Reasoning
ArXiv CS.CL2026-05-26