Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning 文章

ArXiv CS.AI2026-06-03NEWSen作者: Hongye Cao, Nuo Yan, Haoyuan Deng, Ziwei Wang, Tianpei Yang, Jing Huo, Yuyao Zhang, Yang Gao

查看原文 →

关系图谱

摘要

arXiv:2606.03762v1 Announce Type: cross Abstract: Agentic reinforcement learning (RL) equips large language models (LLMs) with tool-use capabilities that substantially improve reasoning on complex tasks. However, integrating external tools often destabilizes training: over-reliance on tools can induce input distribution shift, while overly conservative tool use limits effective exploration. To address this issue, we propose a unified framework TAO-RL that couples tool-aware trajectory filtering with entropy-guided exploration for efficient policy optimization. Specifically, at the data level, TAO-RL filters rollout trajectories along two criteria: discarding those where all tool invocations fail to execute, and removing those where all rollouts are either correct or incorrect, as both cases yield degenerate advantage estimates that contribute no discriminative learning signal.

Tool-Aware Optimization with Entropy Guidance for Efficient Agentic Reinforcement Learning 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (3)