MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection arXiv:2605.30288v1 Announce Type: new Abstract: Mid-training has become an important stage in modern LLM development, using large-scale curated mixtures to strengthen capabilities before final post-training. Its data selection problem is distinct: the data are optimized under a pretraining-style objective at near-pretraining scale, but are curated toward downstream capabilities and drawn from heterogeneous sources with differen