Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Asking Is Not Enough: Protocol Sensitivity in LLM Confidence Calibration arXiv:2605.27752v1 Announce Type: new Abstract: LLM confidence calibration is often evaluated by comparing two signals: token-probability scores and verbalized confidence. These signals are sometimes treated as direct readouts of model uncertainty, but their comparison depends on measurement choices that are rarely made explicit. In the main analysis, we hold the verbalized-confidence elicitation fixed: a single prompt tem