OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning 文章

ArXiv CS.AI2026-06-02NEWSen作者: Yuxiao Yang, Xiaoyun Wang, Weitong Zhang

OGLS-SD: On-Policy Self-Distillation with Outcome-Guided Logit Steering for LLM Reasoning · 相关技术