SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions 事件
PRODUCT_LAUNCH2026-06-05影响: MEDIUM
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions arXiv:2604.08477v2 Announce Type: replace-cross Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has substantially improved reasoning in formal domains such as mathematics and code, but extending these gains beyond STEM remains challenging. Extending RLVR beyond STEM is fundamentally constrained by the lack of high-quality verifiable training data. In this work, we introduce SUPERNO