Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Calibration Is Not Enough: Evaluating Confidence Estimation Under Language Variations arXiv:2601.08064v2 Announce Type: replace Abstract: Confidence estimation (CE) indicates how reliable the answers of large language models are and impacts user trust and decision-making. Existing evaluations mainly concern the alignment between confidence and correctness, but ignore the variability of language: confidence estimates should remain consistent under semantically equivalent prompts or answer variat