Child-directed speech facilitates production, not comprehension, in BabyLMs 文章

ArXiv CS.CL2026-06-02NEWSen作者: Bastian Bunzeck, Sina Zarrie{\ss}

详细信息

来源站点
ArXiv CS.CL
作者
Bastian Bunzeck, Sina Zarrie{\ss}
文章类型
NEWS
语言
en
发布日期
2026-06-02

摘要

arXiv:2606.01045v1 Announce Type: new Abstract: Recent studies suggest that child-directed speech is not conducive to language learning in BabyLMs. However, current evaluations focus predominantly on comprehension and not production, which is central to usage-based theories of language acquisition which argue how CDS facilitates early language use through constructional ''frames'' (frequent lexical patterns with open slots). We introduce a novel generation-based evaluation inspired by such theories in form of a frame-completion task, and compare Llama models trained with CDS, the BabyLM corpus, and web-crawl data (FineWeb-edu) on comprehension benchmarks and our novel framework. Our results reveal a clear dissociation between models' comprehension and production capabilities: while FineWeb-trained models excel at minimal pairs, CDS-trained models produce grammatical completions substantially earlier in training and concentrate probability mass on appropriate slot-fillers.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据