Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training 文章

ArXiv CS.CL2026-06-01NEWSen作者: Shengrui Li, Fei Zhao, Kaiyan Zhao, Jieying Ye, Haifeng Liu, Fangcheng Shi, Zheyong Xie, Yao Hu, Shaosheng Cao

Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training · 相关技术