Measuring Epistemic Humility in Multimodal Large Language Models 文章

ArXiv CS.CV2026-05-26NEWSen作者: Bingkui Tong, Jiaer Xia, Sifeng Shang, Kaiyang Zhou

摘要

arXiv:2509.09658v2 Announce Type: replace Abstract: Hallucinations in multimodal large language models (MLLMs) -- where the model generates content inconsistent with the input image -- pose significant risks in real-world applications, from misinformation in visual question answering to unsafe errors in decision-making. Existing benchmarks primarily test recognition accuracy, i.e., evaluating whether models can select the correct answer among distractors. This overlooks another important capability for trustworthy AI: recognizing when none of the provided options is supported by the image and abstaining from committing to a false choice, a humility-related behavior. We present HumbleBench, a new hallucination benchmark designed to evaluate false-option rejection in MLLMs under a forced-choice multiple-choice setting with a ``None of the above'' option.

Measuring Epistemic Humility in Multimodal Large Language Models 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (2)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (31)