Procedural Pretraining: Warming Up Language Models with Abstract Data 文章

ArXiv CS.CL2026-05-29NEWSen作者: Liangze Jiang, Zachary Shinnick, Anton van den Hengel, Hemanth Saratchandran, Damien Teney

查看原文 →

关系图谱

摘要

arXiv:2601.21725v2 Announce Type: replace Abstract: Pretraining language models directly on web-scale corpora is the de facto paradigm. We study an alternative where the model is initially exposed to abstract structured data to ease the subsequent acquisition of rich semantic knowledge, much like humans learning simple logic and mathematics before higher reasoning. We focus on procedural data, generated by formal languages and other simple algorithms, as such abstract data. We first diagnose the algorithmic skills that different forms of procedural data can improve, often significantly. For example, the accuracy of context recall (Needle-in-a-haystack) jumps from 10 to 98% when a model is pretrained on Dyck sequences (balanced brackets). Second, we study how these gains are reflected in pretraining larger models (up to 1.3B). We find that front-loading as little as 0.1 to 0.

Procedural Pretraining: Warming Up Language Models with Abstract Data 文章

摘要

相关事件查看全部 (3)

相关公司

相关人物

相关产品

相关技术查看全部 (2)