Agentic Transformers Provably Learn to Search via Reinforcement Learning 文章

ArXiv CS.AI2026-06-02NEWSen作者: Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

摘要

arXiv:2606.00183v1 Announce Type: cross Abstract: Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire such search capabilities from the training dynamics of reinforcement learning (RL). We study this question in a stochastic $k$-ary tree environment, where an agentic transformer observes only its trajectory history through interaction and receives a terminal reward for reaching a hidden leaf goal node. We first construct a two-head transformer that implements randomized depth-first search (DFS): one head tracks previous actions, while the other detects failure outcomes and triggers backtracking.

Agentic Transformers Provably Learn to Search via Reinforcement Learning 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (5)