EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance arXiv:2509.23730v2 Announce Type: replace Abstract: Large language models (LLMs) have recently advanced in reasoning when optimized with reinforcement learning (RL) under verifiable rewards. Existing methods primarily rely on outcome-based supervision to strengthen internal LLM reasoning, often leading to inefficient exploration and sparse rewards. To mitigate this issue, we propose Expert-Assisted Policy Optimization (EAPO),