AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian 文章

ArXiv CS.CL2026-05-27NEWSen作者: Wajdi Zaghouani, Kholoud K. Aldous, Isra Fejzullaj

摘要

arXiv:2605.26954v1 Announce Type: new Abstract: Safety evaluation of Large Language Models (LLMs) has largely focused on high-resource languages, leaving low-resource languages critically underserved. We present AlbanianLLMSafety, the first publicly available safety evaluation dataset for LLMs in Albanian, a linguistically distinct low-resource language with approximately 7.5 million speakers across Albania, Kosovo, North Macedonia, and the diaspora. The dataset contains 2,951 prompts spanning 11 safety categories, including self-harm, violence, racist content, child exploitation, and radicalization, with an average of 268 prompts per category. Each prompt is provided in Albanian with an English reference translation and a detailed category label. This resource addresses a significant gap in safety evaluation infrastruc-ture for low-resource languages and provides an essential benchmark for developing safer, more inclusive LLMs.

AlbanianLLMSafety: A Safety Evaluation Dataset for Large Language Models in Albanian 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (5)

相关人物

相关产品查看全部 (6)

相关技术查看全部 (22)