An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic 文章

ArXiv CS.CL2026-06-05NEWSen作者: Shuze Liu, Qianwen Guo, Yushun Dong

摘要

arXiv:2606.05725v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly deployed through hosted APIs, making model extraction a practical threat to model ownership and service security. However, individual extraction queries often resemble benign requests, and existing evaluations often focus on single-query anomaly scoring or pure benign-versus-attacker user settings. We formulate model extraction monitoring as benign-calibrated traffic-window distribution testing and show that an embarrassingly simple detector is effective: embed incoming queries into a semantic space and test whether their aggregate distribution deviates from historical benign traffic. We instantiate the detector with maximum mean discrepancy (MMD), using only benign-vs-benign comparisons to set the decision threshold. We evaluate on fourteen attacker-normal query pairs from four extraction scenarios and compare with adapted PRADA, SEAT, CAP, DATE, and marginal Mahalanobis baselines.

An Embarrassingly Simple Detector for Model Extraction Attacks in Large Language Model API Traffic 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (7)