Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Data Difficulty and the Generalization--Extrapolation Tradeoff in LLM Fine-Tuning arXiv:2605.12906v2 Announce Type: replace-cross Abstract: Data selection during supervised fine-tuning (SFT) can critically change the behavior of large language models (LLMs). Although existing work has studied the effect of selecting data based on heuristics such as perplexity, difficulty, or length, the reported findings are often inconsistent or context-dependent. In this work, we systematically study the role