Graph-Enhanced Policy Optimization in LLM Agent Training 文章

ArXiv CS.AI2026-05-29NEWSen作者: Jiazhen Yuan, Zhike Gong, Jinquan Hang, Zhengbiao Bai, Wei Zhao

摘要

arXiv:2510.26270v2 Announce Type: replace Abstract: Multi-step LLM agents in interactive environments represent a crucial step toward long-horizon decision-making. To train such agents, group-based reinforcement learning is widely adopted, which reinforces trajectories with higher relative performance within the group. However, in most existing methods, every step within a trajectory and every trajectory with the same terminal reward receive identical credit, regardless of their actual contributions. Since different states play different structural roles in an online state-transition graph built from sampled trajectories, their impacts should be differentiated and converted into task-aware credit at both the step and trajectory levels. We therefore present Graph-Enhanced Policy Optimization (GEPO), a framework for dual-level structural credit assignment in multi-step LLM agent training.

Graph-Enhanced Policy Optimization in LLM Agent Training 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)