XLGoBench: Detecting cross-lingual skill gaps with algorithmic tasks 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

XLGoBench: Detecting cross-lingual skill gaps with algorithmic tasks arXiv:2605.30788v1 Announce Type: new Abstract: We introduce a set of synthetic algorithmic tasks to detect cross-lingual gaps in the abilities of large language models. Our benchmark is commensurate across languages, since it requires models to perform the same underlying task in different languages; scalable, since each task can be generated at varying levels of complexity allowing it to be adapted to models with different c