An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic 文章

ArXiv CS.CL2026-06-05NEWSen作者: Shuze Liu, Qianwen Guo, Yushun Dong

摘要

arXiv:2606.05725v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-calibrated traffic-window distribution testing and show that an embarrassingly simple detector is effective: embed incoming queries into a semantic space and test whether their aggregate distribution deviates from historical benign traffic. We instantiate the detector with maximum mean discrepancy (MMD), using only benign-vs-benign comparisons to set the decision threshold. We evaluate on fourteen attacker-normal query pairs from four extraction scenarios and compare with adapted PRADA, SEAT, CAP, DATE, and marginal Mahalanobis baselines.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据