PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical Question Answering Benchmark 事件
BREAKTHROUGH2026-05-27影响: HIGH
PersianMedQA: Evaluating Large Language Models on a Persian-English Bilingual Medical Question Answering Benchmark arXiv:2506.00250v4 Announce Type: replace Abstract: Large Language Models (LLMs) have achieved remarkable performance on a wide range of Natural Language Processing (NLP) benchmarks, often surpassing human-level accuracy. However, their reliability in high-stakes domains such as medicine, particularly in low-resource languages, remains underexplored. In this work, we introduce Pers
相关人物
暂无数据