RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs 事件

Name: RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs arXiv:2509.21128v2 Announce Type: replace Abstract: Large language models (LLMs) are typically trained by reinforcement learning (RL) with verifiable rewards (RLVR) and supervised fine-tuning (SFT) on reasoning traces to improve their reasoning abilities. However, how these methods shape reasoning capabilities remains largely elusive. Going beyond an accuracy-based investigation of how these two components sculpt the reasoning proc

人工智能

关系图谱

RL Squeezes, SFT Expands: A Comparative Study of Reasoning LLMs 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)