HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains 文章

ArXiv CS.CL2026-05-28NEWSen作者: Zheng Li, Mao Zheng, Mingyang Song, Tianxiang Fei

摘要

arXiv:2605.28315v1 Announce Type: new Abstract: General-purpose machine translation benchmarks such as FLORES-200 have reached a saturation regime on Chinese-English pairs, where modern large language models cluster within a narrow band of high scores. Across 22 systems, FLORES-200 zh-en GEMBA scores fall in a 7.87-point range with a standard deviation of 2.29, which compresses the separation between systems on knowledge-intensive domains such as finance, healthcare, law, and science and technology. We introduce HardMTBench, a difficulty-aware diagnostic benchmark for bidirectional Chinese-English domain translation. HardMTBench covers 12 domains and contains 10,000 hand-curated source sentences with reference translations, packaged as 20,000 directional test items.

HardMTBench: Stress-Testing Chinese-English Translation on Knowledge-Intensive Domains 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (5)

相关技术查看全部 (1)