q0: Primitives for Hyper-Epoch Pretraining 文章

ArXiv CS.AI2026-06-03NEWSen作者: Bishwas Mandal, Shmuel Berman, Akshay Vegesna, Samip Dahal

摘要

arXiv:2606.03938v1 Announce Type: cross Abstract: Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions. We introduce hyper-epoch pretraining (q0), which turns a multi-epoch budget into a population of diverse models whose combined predictions reach a lower validation loss than a single refined model. q0 reduces to three core primitives. A cyclic schedule with anti-correlated learning rate and weight decay collects diverse models from a few parallel trajectories. Chain distillation trains each model against its predecessor so that model quality compounds across the population. A learned prior, fit on a held out set, selects and weights members for any inference budget.

q0: Primitives for Hyper-Epoch Pretraining 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (4)