A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents 文章

ArXiv CS.CL2026-06-01NEWSen作者: Raghu Arghal, Fade Chen, Niall Dalton, Evgenii Kortukov, Calum McNamara, Angelos Nalmpantis, Moksh Nirvaan, Gabriele Sarti, Mario Giulianelli

查看原文 →

关系图谱

摘要

arXiv:2602.08964v2 Announce Type: replace-cross Abstract: Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for evaluating goal-directedness that integrates behavioural evaluation with interpretability-based analyses of models' internal representations. As a case study, we examine an LLM agent navigating a 2D grid world towards a goal state. Behaviourally, we evaluate the agent against optimal policies across varying grid sizes, obstacle densities, and goal structures, finding that performance scales with task difficulty while remaining robust to difficulty-preserving transformations and multi-goal structures. We then use probing methods to decode internal representations of the environment and multi-step action plans.

A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术