A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents arXiv:2602.08964v2 Announce Type: replace-cross Abstract: Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study,