Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Towards Reliable Multilingual LLMs-as-a-Judge: An Empirical Study arXiv:2605.28710v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for the automatic evaluation of generated text, yet most prior work focuses on English. Despite the growing demand for multilingual evaluation, extending LLM-based evaluators to multilingual settings remains challenging, particularly for low-resource languages and scenarios where in-domain data is scarce. This work explores several