Agentic Transformers Provably Learn to Search via Reinforcement Learning 文章

ArXiv CS.AI2026-06-02NEWSen作者: Tong Yang, Yu Huang, Yingbin Liang, Yuejie Chi

摘要

arXiv:2606.00183v1 Announce Type: cross Abstract: Tree search is a central abstraction behind many language-agent reasoning and decision-making tasks: agents must explore actions, remember failures, and backtrack toward promising alternatives. Yet, we lack a theoretical understanding of how transformer-based policies acquire such search capabilities from the training dynamics of reinforcement learning (RL). We study this question in a stochastic $k$-ary tree environment, where an agentic transformer observes only its trajectory history through interaction and receives a terminal reward for reaching a hidden leaf goal node. We first construct a two-head transformer that implements randomized depth-first search (DFS): one head tracks previous actions, while the other detects failure outcomes and triggers backtracking.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据