Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization 文章

ArXiv CS.AI2026-06-02NEWSen作者: Anh B. H. Nguyen, Ba Tho Phan, Viet Cuong Ta

摘要

arXiv:2605.17839v3 Announce Type: replace-cross Abstract: Knowledge distillation transfers knowledge from a high capacity teacher to a compact student using a mixture of hard and soft losses. On imbalanced data, a fixed weighting between hard and soft losses becomes brittle the learning process. Recent studies try to reweight these components in long-tailed settings. However, most of these methods do not adapt weights at the sample-wise level and do not take into account the students behavior during training. To address this, we propose BiKD -- a bilevel framework that dynamically balances hard and soft losses for each sample. We employ a weight generation network that produces adaptive per-sample weights, guided by a small balanced validation set. The student is now trained with an unconstrained combination of weighted hard and soft losses, allowing the student to relax both terms.

相关公司

暂无数据

相关人物

暂无数据