Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight 文章

ArXiv CS.AI2026-06-02NEWSen作者: Can Jin, Jiakang Li, Rui Wu, Eddy Zhang, Dimitris N. Metaxas

摘要

arXiv:2606.00424v1 Announce Type: new Abstract: As large language models become stronger, weak supervisors may fail to provide reliable labels, preferences, or final judgments for complex outputs, limiting both weak-to-strong generalization and scalable oversight. We study a more tractable form of weak supervision: using a weak model as a critic rather than as a labeler or judge. Instead of solving the task or selecting the correct answer, the weak critic only needs to provide a non-misleading revision direction that helps the strong model better use its own knowledge. We call this setting *weak-critic strong oversight*. We first show that weak critiques can improve frozen strong models at inference time, and that critique quality is key to this improvement. We then propose progressive on-policy critique distillation (**OPCD**), which filters high-quality critiques and distills critic-guided behavior into the strong model through adaptive self-teacher signals.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据