Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers arXiv:2602.07842v2 Announce Type: replace Abstract: Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under single-answer question answering. In this paper, we show that these methods break down in the presence of multiple valid answers, where disagreement among equally correct responses leads to systematic undere
Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers · 相关人物
暂无数据