Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning 事件

Name: Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning
Start: 2026-06-02

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning arXiv:2606.00755v1 Announce Type: new Abstract: Reinforcement learning from verifiable rewards improves the reasoning ability of large language models, but often suffers from entropy collapse, in which increasingly concentrated policies reduce rollout diversity and useful learning signals. Existing remedies either constrain the RL objective (e.g., entropy regularization) or adjust sampling tem

人工智能

关系图谱

Internalize the Temperature: On-Policy Self-Distillation as Policy Reheater for Reinforcement Learning 事件

相关公司查看全部 (9)

相关人物查看全部 (1)

相关产品查看全部 (10)

相关技术查看全部 (10)

相关报道查看全部 (1)