The Multilingual Curse at the Retrieval Layer: Evidence from Amharic 文章

ArXiv CS.CL2026-05-26NEWSen作者: Yosef Worku Alemneh, Kidist Amde Mekonnen, Maarten de Rijke

摘要

arXiv:2605.24556v1 Announce Type: cross Abstract: Multilingual retrieval increasingly underpins cross-lingual question answering and retrieval-augmented generation. Strong zero-shot scores on multilingual benchmarks are often taken as evidence that current encoders transfer reliably across many languages. We argue that this assumption breaks down for underrepresented, morphologically rich languages, and use Amharic as a diagnostic case. Under a shared passage retrieval protocol covering dense, late-interaction, learned sparse, and cross-encoder paradigms, we compare zero-shot multilingual retrievers, Amharic-fine-tuned multilingual retrievers, and monolingual Amharic retrievers. The strongest zero-shot multilingual retriever underperforms the strongest monolingual Amharic first-stage retriever by 23% relative MRR@10.

The Multilingual Curse at the Retrieval Layer: Evidence from Amharic 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术