A Predictive Law for On-Policy Self-Distillation From World Feedback 文章

ArXiv CS.AI2026-05-29NEWSen作者: Tommy He, Jerome Sieber, Matteo Saponati

A Predictive Law for On-Policy Self-Distillation From World Feedback · 相关技术