摘要
arXiv:2601.19919v2 Announce Type: replace Abstract: Knowledge distillation (KD) is one of the most effective paradigms for compressing large-scale foundation models into deployable architectures. In the context of Automatic Speech Recognition (ASR), previous studies have predominantly focused on forcing the student model to strictly mimic the predictive distribution of a massive teacher model. However, this static dependency often presents an inherent trade-off: while the student rapidly acquires basic linguistic representations, it simultaneously inherits the teacher's domain-specific blind spots and over-confident hallucinations, leading to a severe decline in out-of-distribution generalization capacity. To effectively mitigate this issue, we propose Adaptive Self-Knowledge Distillation (ASKD), a dynamic curriculum framework.
相关事件查看全部 (2)
相关公司
暂无数据
相关人物
暂无数据
相关产品
暂无数据