Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information 文章

ArXiv CS.CL2026-05-27NEWSen作者: Xin Huang, Antoni B. Chan

摘要

arXiv:2601.03089v2 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly evaluated with input attribution methods, yet comparing such explanations remains challenging. Existing soft-perturbation faithfulness metrics, such as Soft-NC and Soft-NS, can conflate attribution quality with the number of words retained during perturbation: attribution methods with larger average scores may keep more words and therefore obtain inflated scores. To address this issue, we propose $\pi$-Soft-NC and $\pi$-Soft-NS, an evaluation framework that compares attribution methods under the same expected retaining probability, thus controlling the number of retained words. We further introduce Grad-ELLM, a gradient-based attribution method tailored to autoregressive decoder-only LLMs, which combines gradient-derived channel importance with attention-derived token importance at each decoding step.

Faithfulness Evaluation for Decoder-only LLM Attributions with Controlled Retained Information 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (12)

相关技术查看全部 (26)