Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them 事件

PRODUCT_LAUNCH2026-06-09影响: MEDIUM

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them arXiv:2606.07597v1 Announce Type: cross Abstract: Pre-training data mixtures are commonly tuned by running small-scale experiments and extrapolating to the target training budget. When high-quality data is scarce and must be repeated, this extrapolation frequently fails, but the source of the failure has not been isolated. We show that a primary culprit is a repetition mismatch: because high-quality datasets are s