SLAD : Shared LoRA Adapters for Task Specific Distillation 文章

ArXiv CS.CV2026-05-29NEWSen作者: Reda Bensaid, Yassir Bendou, Vincent Gripon, Fran\c{c}ois Leduc-Primeau

摘要

arXiv:2605.29726v1 Announce Type: new Abstract: In the context of resource-constrained environments such as embedded systems, adapting reduced-size foundation models to downstream tasks has become increasingly popular. This has recently motivated the emerging setting of task-specific distillation, where a larger and a smaller version of the same foundation model are both adapted to the same downstream task, with the goal of transferring knowledge from the former to the latter. Recent work has demonstrated the benefits of using a larger version of the same foundation model to assist the adaptation of a smaller one. Typically, the larger model (teacher) is first adapted via fine-tuning or linear probing before its knowledge is distilled into the smaller model (student). While fine-tuning the teacher often increases its performance, recent work showed that probing it leads to better knowledge distillation to the student.

SLAD : Shared LoRA Adapters for Task Specific Distillation 文章

摘要

相关事件查看全部 (1)

相关公司

相关人物

相关产品

相关技术查看全部 (5)