ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation 事件

PRODUCT_LAUNCH2026-06-05影响: MEDIUM

ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation arXiv:2606.05718v1 Announce Type: new Abstract: On-policy distillation (OPD) improves reasoning by training a student on trajectories sampled from its own policy under supervision from a teacher. In multimodal reasoning, a common extension is to use a privileged teacher that observes training-time-only signals such as reference answers or rationales. However, such answer-side privilege creates a train-test mismatc

ViCuR: Visual Cues as Recoverable Privilege for Multimodal On-Policy Distillation · 相关人物