Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth 文章

ArXiv CS.CL2026-05-26NEWSen作者: Yoav Gur-Arieh, Ana Marasovi\'c, Mor Geva

摘要

arXiv:2605.25052v1 Announce Type: new Abstract: Chains of thought (CoTs) have become central in interpreting and auditing behaviors of large language models. Yet growing evidence suggests that these traces often fail to faithfully represent the computations behind a model's predictions. Several faithfulness metrics have been proposed, but whether they indeed measure faithfulness remains unknown. Answering this requires ground-truth labels, which are hard to obtain since internal computations are not directly observable. Consequently, most works proposing metrics report only absolute scores or comparisons to prior metrics, and the few existing benchmarks rely on proxies like plausibility or importance, properties orthogonal to faithfulness that can mislead about whether a CoT can be trusted.

Faithfulness Metrics Don't Measure Faithfulness: A Meta-Evaluation with Ground Truth 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (2)