LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards 文章

ArXiv CS.CL2026-06-01NEWSen作者: Nianyi Lin, Jiajie Zhang, Lei Hou, Juanzi Li

摘要

arXiv:2605.31584v1 Announce Type: new Abstract: Long-context reasoning remains a central challenge for large language models, which often fail to locate and integrate key information in extensive distracting content. Reinforcement learning with verifiable rewards (RLVR) has shown promise for this task, yet existing methods are limited by low-confusability distractors and sparse, outcome-only reward signals that cannot supervise intermediate reasoning steps. To address these issues, we introduce \textsc{LongTraceRL}. For data construction, we generate multi-hop questions via knowledge graph random walks and leverage search agent trajectories to build \emph{tiered distractors}: documents the agent read but did not cite (high confusability) and documents that appeared in search results but were never opened (low confusability), producing training contexts that are far more challenging than those built by random sampling or one-shot search.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据