Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation 文章

ArXiv CS.AI2026-05-27NEWSen作者: Tianlei Chen, Jiao Ou, Ziyuan Liu, Ruiming Tang, Jian Liang, Han Li

详细信息

来源站点: ArXiv CS.AI
作者: Tianlei Chen, Jiao Ou, Ziyuan Liu, Ruiming Tang, Jian Liang, Han Li
文章类型: NEWS
语言: en
发布日期: 2026-05-27

摘要

arXiv:2605.27115v1 Announce Type: new Abstract: Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model. Recent Multi-Teacher On-Policy Distillation (MOPD) pipelines recover model capabilities by supervising student-generated trajectories with teacher feedback, but typically assume teacher-aligned prompt coverage, requiring prompts to match the teachers' training distributions. This assumption is difficult to satisfy when the general teacher is an open-source model whose post-training data are unknown. Instead of attempting to reconstruct this hidden distribution, we study general capability recovery with readily available proxy general prompts.

Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation 文章

详细信息

摘要

相关事件

相关公司查看全部 (5)

相关人物

相关产品查看全部 (6)

相关技术查看全部 (17)