LLMSurgeon: Diagnosing Data Mixture of Large Language Models 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
LLMSurgeon: Diagnosing Data Mixture of Large Language Models arXiv:2605.30348v1 Announce Type: new Abstract: The pretraining data mixture of Large Language Models (LLMs) constitutes their "digital DNA", shaping model behaviors, capabilities, and failure modes. Yet this composition is rarely disclosed, making post-hoc auditing of data combination or provenance difficult. In this work, we formalize $\textbf{{Data Mixture Surgery (DMS)}}$: given only generated text from a target LLM, estimate the
相关产品查看全部 (10)
相关报道查看全部 (1)
LLMSurgeon: Diagnosing Data Mixture of Large Language Models
ArXiv CS.CL2026-05-29