Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers 事件

PRODUCT_LAUNCH2026-06-03影响: MEDIUM

Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers arXiv:2602.07842v2 Announce Type: replace Abstract: Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under single-answer question answering. In this paper, we show that these methods break down in the presence of multiple valid answers, where disagreement among equally correct responses leads to systematic undere