Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training 文章
ArXiv CS.CL2026-06-01NEWSen作者: Shengrui Li, Fei Zhao, Kaiyan Zhao, Jieying Ye, Haifeng Liu, Fangcheng Shi, Zheyong Xie, Yao Hu, Shaosheng Cao