Detector-Evasive LLM Paraphrasing via Constrained Policy Optimization 文章

ArXiv CS.AI2026-06-02NEWSen作者: Mingyi Wang, Zhuoer Shen, Yuheng Bu, Shaofeng Zou

摘要

arXiv:2606.00392v1 Announce Type: cross Abstract: AI-text detectors are vulnerable to paraphrasing and detector-guided paraphrasing attacks, but existing detector-evasion methods often lack precise control over semantic preservation. In particular, optimizing directly for detector evasion can degrade fine-grained semantics, whereas scalarized reward designs provide only indirect, weight-sensitive control over the evasion-semantics trade-off. We address this limitation by formulating detector-evasive LLM paraphrasing as a Constrained Markov Decision Process, where detector evasion is the primary objective and semantic preservation is enforced as an explicit constraint. We propose Detector Evasion Policy Optimization (DEPO), a Lagrangian primal-dual reinforcement learning algorithm with a novel GRPO-style group-based policy update.