RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs arXiv:2509.21128v2 Announce Type: replace Abstract: Large language models (LLMs) are typically trained by reinforcement learning (RL) with verifiable rewards (RLVR) and supervised fine-tuning (SFT) on reasoning traces to improve their reasoning abilities. However, how these methods shape reasoning capabilities remains largely elusive. Going beyond an accuracy-based investigation of how these two components sculpt the reasoning proc