OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning arXiv:2604.18530v2 Announce Type: replace Abstract: Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial policy distribution. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integ