LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss 文章

ArXiv CS.CL2026-06-01NEWSen作者: Szilvia Ujv\'ary, Louis B\'ethune, Pierre Ablin, Jo\~ao Monteiro, Marco Cuturi, Michael Kirchhof

查看原文 →

关系图谱

摘要

arXiv:2602.12005v4 Announce Type: replace Abstract: Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under this setting, we study the fundamental question of \emph{which tokens an SLM can and should learn} during pretraining, versus \emph{which ones it should delegate} via a \texttt{} token. We find that this is not simply a question of loss: although the loss is predictive of whether a predicted token mismatches the ground-truth, it is insufficient for identifying which predictions would actually lead to factual or semantically invalid continuations.

LaCy: What Small Language Models Can and Should Learn is Not Just a Question of Loss 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (1)