Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data 事件
BREAKTHROUGH2026-06-01影响: HIGH
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data arXiv:2601.19936v2 Announce Type: replace-cross Abstract: The opacity of massive pretraining corpora in Large Language Models (LLMs) raises significant privacy and copyright concerns, making pretraining data detection a critical challenge. Existing state-of-the-art methods typically rely on token likelihoods, yet they often overlook the gap between the target token and the model's top-1 prediction, as well as local correlatio
相关产品查看全部 (10)
相关报道查看全部 (1)
Gap-K%: Measuring Top-1 Prediction Gap for Detecting Pretraining Data
ArXiv CS.CL2026-06-01