Cordyceps: Covert Control Attacks on LLMs via Data Poisoning 文章

ArXiv CS.AI2026-05-27NEWSen作者: Zedian Shao, Charles Fleming, Teodora Baluta

详细信息

来源站点: ArXiv CS.AI
作者: Zedian Shao, Charles Fleming, Teodora Baluta
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.26595v1 Announce Type: cross Abstract: Large language models (LLMs) are often fine-tuned on uncurated text datasets that adversaries can poison. Existing poisoning attacks primarily rely on fixed trigger phrases that defenses such as outlier detection, clean-data regularization, or online monitoring can neutralize. In this paper, we propose a data poisoning method that teaches an LLM an information hiding scheme reliably and stealthily through semantic associations between shared knowledge such as facts or concepts and attacker-chosen phrases. The induced hiding scheme can encode and decode arbitrary malicious instructions, thus revealing a new and subtle poisoning-induced vulnerability: covert control attacks. We precisely characterize covert control attacks and evaluate them across $5$ LLMs, $3$ backdoor defenses, and $4$ prompt injection defenses.

Cordyceps: Covert Control Attacks on LLMs via Data Poisoning 文章

详细信息

摘要

相关事件

相关公司查看全部 (3)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (27)