Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks 文章

ArXiv CS.CL2026-06-18NEWSen作者: Aaditya Pai

详细信息

来源站点: ArXiv CS.CL
作者: Aaditya Pai
文章类型: NEWS
语言: en
发布日期: 2026-06-18

摘要

arXiv:2606.18530v1 Announce Type: cross Abstract: Domain-camouflaged injection attacks embed malicious instructions in retrieved content using domain-appropriate vocabulary, evading standard detectors that rely on syntactic injection markers. When detection fails, practitioners need to know which defense architectures reduce attack success. We evaluate five prompting-based defenses (spotlighting, paraphrasing, prompt sandwiching, and two combinations) against domain-camouflaged injection across three model families (Claude Haiku, Llama 3.1 8B, Gemini 2.0 Flash) and three deployment domains (financial, legal, general) using 3,510 trials. Paraphrasing retrieved content before agent processing is the most consistently effective defense in this benchmark, reducing camouflage attack success rate by 55-84\% depending on model, and achieves lower attack success rates than our Llama Guard 4 configuration on every model tested.

Evaluating Prompting-Based Defenses Against Domain-Camouflaged Injection Attacks 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (4)

相关技术查看全部 (4)