On the Salience of Low-Probability Tokens for AI-Generated Text Detection: A Multiscale Uncertainty Perspective 文章

ArXiv CS.CL2026-06-02NEWSen作者: Yikai Guo, Bin Wang, Xilai Fan, Wenjun Ke, Haoran Luo

摘要

arXiv:2606.02158v1 Announce Type: new Abstract: AI-generated text increasingly blends with human writing, raising practical risks such as misinformation, academic misuse, and corpora contamination. While statistical detectors are appealing for efficiency and generalization, they suffer from two key limitations. (i) Boilerplate dominance, boilerplate tokens shared across human and LLM writing can overwhelm discriminative signals. (ii) Brittle point estimates, relying on a single probability score yields unstable decisions under adversarial manipulations. To address these issues, we propose Uncertainty, a multiscale uncertainty estimator that focuses on informative low-probability tokens, which more clearly expose distributional discrepancies. Locally, it alleviates boilerplate dominance by averaging the log-probabilities of low-probability tokens; globally, it reduces brittleness by capturing the distributional shape of this low-probability region via R\'enyi entropy.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据