"Chi nas dal soch el sent de legn" -- Auditing Text Corpora for Lombard 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

"Chi nas dal soch el sent de legn" -- Auditing Text Corpora for Lombard arXiv:2606.06349v1 Announce Type: new Abstract: Several of the world's languages are still under-resourced in terms of Natural Language Processing (NLP) tools. This is mostly due to the lack of high-quality datasets to train, develop, and evaluate systems and models for several tasks, such as Machine Translation (MT). We conduct a manual audit of the parallel and monolingual corpora available for Lombard, an under-resourced