Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training arXiv:2602.00747v3 Announce Type: replace Abstract: Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-training, where models must balance general competence with proficiency on hard tasks such as math and code. However, identifying an optimal mixture remains an open challenge, as existing approaches either rely on unreliable tiny-scale proxy experim