Pretraining Language Models on Historical Text 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Pretraining Language Models on Historical Text arXiv:2606.02991v1 Announce Type: new Abstract: We introduce TypewriterLM, a 7.24B History language model (LM) trained exclusively on English text predating 1913. Developing History LMs requires addressing challenges in data quality and availability, preventing temporal leakage, designing temporally consistent post-training pipelines, and constructing reliable evaluations. To address these issues, we construct TypewriterCorpus, a 54B-token historic
相关产品查看全部 (10)
相关报道查看全部 (1)
Pretraining Language Models on Historical Text
ArXiv CS.CL2026-06-03