HaluNet: Learning Hallucination Risk from Internal Signals in LLM Question Answering 文章

ArXiv CS.CL2026-05-29NEWSen作者: Chaodong Tong, Qi Zhang, Zhuojun Jiang, Lei Jiang, Yanbing Liu

详细信息

来源站点: ArXiv CS.CL
作者: Chaodong Tong, Qi Zhang, Zhuojun Jiang, Lei Jiang, Yanbing Liu
文章类型: NEWS
语言: en
发布日期: 2026-05-29

摘要

arXiv:2512.24562v2 Announce Type: replace Abstract: Large language models (LLMs) achieve strong question answering (QA) performance but can produce fluent answers unsupported by available evidence. Existing hallucination detectors often rely on external verification, repeated sampling, or test-time judge calls, which can be costly for real-time QA. We propose \textbf{HaluNet}, a lightweight hallucination risk estimator that uses internal signals from one model generation. HaluNet jointly models token likelihood, predictive entropy, and hidden-state information, allowing probabilistic, distributional, and semantic evidence to inform an answer-level risk score. It is trained with LLM-as-a-Judge labels as scalable weak supervision and evaluated with independent human and multi-judge assessments. Experiments on SQuAD, TriviaQA, and Natural Questions show that HaluNet improves answer-level risk ranking across in-domain and out-of-domain settings.

HaluNet: Learning Hallucination Risk from Internal Signals in LLM Question Answering 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (4)

相关技术查看全部 (2)