Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations 事件

Name: Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations
Start: 2026-05-28

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations arXiv:2605.27958v1 Announce Type: new Abstract: Linear probes trained on LLM activations are increasingly proposed as deception-detection metrics, yet report AUROC exceeding 0.96 on clean benchmarks while collapsing under distributional shift. This paper systematically pressure-tests probe-based metrics across the Gemma 3 model family (1B-27B parameters), diagnosing why they fail rather

人工智能

关系图谱

Pressure-Testing Deception Probes in LLMs: Scaling, Robustness, and the Geometry of Deceptive Representations 事件

相关公司查看全部 (10)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)