When Mean CE Fails: Median CE Can Better Track Language Model Quality 文章

ArXiv CS.AI2026-05-26NEWSen作者: Hao Guo, Simon Dennis, Rivaan Patil, Kevin Shabahang

When Mean CE Fails: Median CE Can Better Track Language Model Quality · 相关技术