Conceptual Steganography 文章

ArXiv CS.CL2026-05-27NEWSen作者: Zhejian Zhou, Jonathan May

摘要

arXiv:2605.26537v1 Announce Type: new Abstract: Language Models (LMs) emit Chains-of-Thought (CoTs) that drive much of their capability. However, the same sequence that carries useful reasoning can also covertly convey messages: a misaligned model may embed covert information in its CoT that slips through human supervision, a form of steganography known as encoded reasoning. Prior LM steganography schemes operate in the token or lexical space, and a content-preserving paraphraser is the canonical and effective defense in recent work. We introduce conceptual steganography, in which each step of a CoT carries information through patterns of high-level reasoning behavior, rather than through lexical choice.

Conceptual Steganography 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (2)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (22)