RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs arXiv:2509.21128v2 Announce Type: replace Abstract: Large language models (LLMs) are typically trained by reinforcement learning (RL) with verifiable rewards (RLVR) and supervised fine-tuning (SFT) on reasoning traces to improve their reasoning abilities. However, how these methods shape reasoning capabilities remains largely elusive. Going beyond an accuracy-based investigation of how these two components sculpt the reasoning proc
相关产品查看全部 (10)
相关报道查看全部 (1)
RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
ArXiv CS.AI2026-05-28