StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation 文章

ArXiv CS.AI2026-06-19NEWSen作者: Guangda Liu, Yiquan Wang, Chengwei Li, Wenhao Chen, Jing Lin, Yiwu Yao, Danning Ke, Wenchao Ding, Jieru Zhao

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation · 相关技术

暂无数据