When Mean CE Fails: Median CE Can Better Track Language Model Quality 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

When Mean CE Fails: Median CE Can Better Track Language Model Quality arXiv:2605.24667v1 Announce Type: new Abstract: Mean cross-entropy is the standard validation metric for language models, but it can fail to track model quality during training. We examine this in two common scenarios. First, in Qwen2.5-1.5B SFT on synthetic fact-learning, we find that mean CE rises substantially after the initial learning phase while held-out fact-recall accuracy remains near its peak. Second, we find that i

When Mean CE Fails: Median CE Can Better Track Language Model Quality · 相关技术