摘要
arXiv:2605.27115v1 Announce Type: new Abstract: Domain specialization can improve LLM behavior in vertical domains, but often weakens the general capabilities inherited from the original model. Recent Multi-Teacher On-Policy Distillation (MOPD) pipelines recover model capabilities by supervising student-generated trajectories with teacher feedback, but typically assume teacher-aligned prompt coverage, requiring prompts to match the teachers' training distributions. This assumption is difficult to satisfy when the general teacher is an open-source model whose post-training data are unknown. Instead of attempting to reconstruct this hidden distribution, we study general capability recovery with readily available proxy general prompts.
相关事件查看全部 (2)
Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
2026-05-27OPEN_SOURCE影响: MEDIUM
Counteraction-Aware Multi-Teacher On-Policy Distillation for General Capability Recovery with Domain Preservation
2026-05-27PRODUCT_LAUNCH影响: MEDIUM
相关人物
暂无数据