LLM Watermark Evasion via Bias Inversion 文章

ArXiv CS.AI2026-05-28NEWSen作者: Jeongyeon Hwang, Sangdon Park, Jungseul Ok

摘要

arXiv:2509.23019v5 Announce Type: replace-cross Abstract: Watermarking offers a promising solution for detecting LLM-generated content, yet its robustness under realistic query-free (black-box) evasion remains an open challenge. Existing query-free attacks often achieve limited success or severely distort semantic meaning. We bridge this gap by theoretically analyzing rewriting-based evasion, demonstrating that reducing the average conditional probability of sampling green tokens by a small margin causes the detection probability to decay exponentially. Guided by this insight, we propose the \emph{Bias-Inversion Rewriting Attack} (BIRA), a practical query-free method that applies a negative logit bias to a proxy suppression set identified via token surprisal. Empirically, BIRA achieves state-of-the-art evasion rates ($>99\%$) across diverse watermarking schemes while preserving semantic fidelity substantially better than prior baselines.

LLM Watermark Evasion via Bias Inversion 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (2)