On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents 文章

ArXiv CS.AI2026-06-02NEWSen作者: Deyu Zou, Yongqiang Chen, Fan Feng, Mufei Li, Pan Li, Yu Gong, James Cheng

摘要

arXiv:2603.12109v2 Announce Type: replace Abstract: Reinforcement learning (RL) has become a de facto paradigm for building LLM-based agents that act, interact, and reason over extended task horizons. However, in active reasoning where agents must elicit new observations through interaction with the environment to solve the task, we find that outcome-based RL can induce a systematic failure mode which we call information self-locking (SeL): agents fail both to elicit informative feedback and to internalize obtained evidence. To understand the issue, we trace agentic behaviors into two coupled capabilities: Action Selection (AS), which determines observation streams, and Belief Tracking (BT), which updates the agent's internal task understanding. Theoretical and empirical analyses reveal a bidirectional bottleneck that leads to SeL: weak BT obscures the credit of informative actions, while weak AS deprives BT of useful evidence.

On Information Self-Locking in Reinforcement Learning for Active Reasoning of LLM agents 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (5)