Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization 文章

ArXiv CS.AI2026-06-02NEWSen作者: Anh B. H. Nguyen, Ba Tho Phan, Viet Cuong Ta

摘要

arXiv:2605.17839v3 Announce Type: replace-cross Abstract: Knowledge distillation transfers knowledge from a high capacity teacher to a compact student using a mixture of hard and soft losses. On imbalanced data, a fixed weighting between hard and soft losses becomes brittle the learning process. Recent studies try to reweight these components in long-tailed settings. However, most of these methods do not adapt weights at the sample-wise level and do not take into account the students behavior during training. To address this, we propose BiKD -- a bilevel framework that dynamically balances hard and soft losses for each sample. We employ a weight generation network that produces adaptive per-sample weights, guided by a small balanced validation set. The student is now trained with an unconstrained combination of weighted hard and soft losses, allowing the student to relax both terms.

Balancing Knowledge Distillation for Imbalance Learning with Bilevel Optimization 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (2)