Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs 事件
OPEN_SOURCE2026-06-03影响: MEDIUM
Hallucination Is Linearly Decodable from Mid-Layer Hidden States in Quantized LLMs arXiv:2606.02628v1 Announce Type: cross Abstract: We investigate whether open-source LLMs encode a linearly separable truthfulness signal in their hidden states, and at which network depth this signal is strongest. Across three $7$B--$8$B instruction-tuned models (Llama-3.1-8B, Mistral-7B, Qwen2.5-7B) loaded in $4$-bit NF4 quantization, we extract per-layer hidden states on four hallucination benchmarks (Truthful