OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning arXiv:2604.18530v2 Announce Type: replace Abstract: Recent advancements in Reinforcement Learning with Verifiable Rewards (RLVR) have significantly improved Large Language Model (LLM) reasoning, yet models often struggle to explore novel trajectories beyond their initial policy distribution. While offline teacher guidance and entropy-driven strategies have been proposed to address this, they often lack deep integ

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning · 相关技术