Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Beyond Normalization: Rethinking the Partition Function as a Difficulty Scheduler for RLVR arXiv:2602.12642v2 Announce Type: replace Abstract: Reward-maximizing RL methods have shown to be capable of enhancing the reasoning performance of LLMs, but often lead to reduced generation diversity. Recent works address this issue by adopting GFlowNets, training LLMs to match a target distribution while jointly learning its partition function. In contrast to prior works that treat this partition functi