Test-Time Deep Thinking to Explore Implicit Rules 文章

ArXiv CS.AI2026-05-26NEWSen作者: Wentong Chen, Xin Cong, Zhong Zhang, Yaxi Lu, Siyuan Zhao, Yesai Wu, Qinyu Luo, Haotian Chen, Yankai Lin, Zhiyuan Liu, Maosong Sun

摘要

arXiv:2605.24828v1 Announce Type: new Abstract: With the continuous advancement of Large Language Models (LLMs), intelligent agents are becoming increasingly vital. However, these agents often fail in environments governed by implicit rules--hidden constraints that cannot be observed directly and must be inferred through interaction. This causes agents to fall into repetitive trial-and-error loops, ultimately leading to task failure. To address this challenge, we propose Test-Time Exploration (TTExplore), a framework where a thinker component analyzes interaction history to infer these implicit rules and guide an actor. Effective exploration in this setting critically depends on the reasoning ability of the thinker. However, evaluating deep reasoning trajectories is inherently unstable and difficult, which poses a major obstacle to effective training. To overcome this issue, we introduce a novel and stable reinforcement learning pipeline.

Test-Time Deep Thinking to Explore Implicit Rules 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (5)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (22)