BioELX: Cross-lingual Biomedical Entity Linking via Alias-based Retrieval and LLM Ranking 文章

ArXiv CS.CL2026-05-28NEWSen作者: Yi Wang, Corina Dima, Liangyu Zhong, Steffen Staab

摘要

arXiv:2605.27380v1 Announce Type: new Abstract: Cross-lingual biomedical entity linking (BEL) maps mentions in any language to unique identifiers in a biomedical knowledge base (KB), supporting clinical and biomedical NLP applications. However, expert-annotated training data for BEL are costly, especially for low-resource languages. Moreover, many cross-lingual BEL systems rely on SapBERT-based retrievers trained on predominantly English aliases in the KB, leading to poor generalization to unseen non-English mentions and limited context-aware disambiguation. We propose BioELX, a two-stage cross-lingual BEL framework that requires no task-specific annotated training corpora. In Stage~1, we enrich SapBERT training with Wikidata-derived multilingual aliases and use the resulting retriever to improve cross-lingual candidate retrieval.