Multilingual and Cross-Lingual Citation Needed Detection on Wikipedia for Lower-Resource Languages 文章

ArXiv CS.CL2026-06-01NEWSen作者: Gerrit Quaremba, Amy Rechkemmer, Elizabeth Black, Denny Vrande\v{c}i\'c, Elena Simperl

摘要

arXiv:2605.31136v1 Announce Type: new Abstract: In automated fact-checking (AFC), check-worthiness detection identifies claims requiring verification based on domain-specific criteria. On Wikipedia, this task instantiates as Citation Needed Detection (CND), which flags claims lacking supporting citations. However, existing research has largely overlooked lower-resource languages, and recent AFC pipelines rely on large language models (LLMs), which are inaccessible to low-resource organizations. We introduce MCN, a multilingual CND corpus spanning 18 languages across three resource levels, on which we conduct an extensive study of small decoder-based language models (SLMs). Our experiments show that SLMs fine-tuned with an encoder-style objective substantially outperform prompted LLMs across languages.