When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception 事件

Name: When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception
Start: 2026-06-01

REGULATION2026-06-01影响: MEDIUM

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception arXiv:2605.30381v1 Announce Type: cross Abstract: Deceptive alignment, in which models maintain accurate internal representations while deliberately producing false outputs, remains a central challenge in AI safety. While strategic deception is the primary long-term concern, synthetic dishonesty - induced via direct optimization on incorrect answers - provides a controlled testbed for

人工智能

关系图谱

When LLMs Learn to Be Consistently Wrong: A Multi-Model Study of Linear Representations of Synthetic Deception 事件

相关公司查看全部 (10)

相关人物查看全部 (3)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)