Bounded Behavioral Indistinguishability for Black-Box LLM Distillation 文章

ArXiv CS.CL2026-06-01NEWSen作者: Munawar Hasan

摘要

arXiv:2605.30448v1 Announce Type: cross Abstract: Black-box LLM distillation is usually evaluated as an output-matching problem: a student is considered successful when its responses are semantically similar to, or task-consistent with, those of a teacher. However, output similarity does not imply that the student is behaviorally indistinguishable from the model it imitates. We introduce bounded behavioral indistinguishability, formalized as $(\epsilon,q,t,\mathbb{A})$-behavioral indistinguishability over an explicit prompt distribution, where $\epsilon$ bounds distinguishing advantage, $q$ bounds oracle queries, $t$ bounds computation, and $\mathbb{A}$ denotes the adversary class. We instantiate this notion on Qwen and Llama teacher-student pairs using a controlled $5,000$-prompt behavioral probe suite.

相关公司

暂无数据

相关人物

暂无数据

相关技术

暂无数据