Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight 文章

ArXiv CS.AI2026-06-02NEWSen作者: Can Jin, Jiakang Li, Rui Wu, Eddy Zhang, Dimitris N. Metaxas

Weak Critics Make Strong Learners: On-Policy Critique Distillation for Scalable Oversight · 相关技术