Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks 文章

ArXiv CS.CL2026-06-18NEWSen作者: Aaditya Pai

详细信息

来源站点
ArXiv CS.CL
作者
Aaditya Pai
文章类型
NEWS
语言
en
发布日期
2026-06-18

摘要

arXiv:2606.18530v1 Announce Type: cross Abstract: Domain-camouflaged injection attacks embed malicious instructions in retrieved content using domain-appropriate vocabulary, evading standard detectors that rely on syntactic injection markers. When detection fails, practitioners need to know which defense architectures reduce attack success. We evaluate five prompting-based defenses (spotlighting, paraphrasing, prompt sandwiching, and two combinations) against domain-camouflaged injection across three model families (Claude Haiku, Llama 3.1 8B, Gemini 2.0 Flash) and three deployment domains (financial, legal, general) using 3,510 trials. Paraphrasing retrieved content before agent processing is the most consistently effective defense in this benchmark, reducing camouflage attack success rate by 55-84\% depending on model, and achieves lower attack success rates than our Llama Guard 4 configuration on every model tested.