Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies arXiv:2512.19673v3 Announce Type: replace-cross Abstract: Existing reinforcement learning (RL) approaches treat large language models (LLMs) as a unified policy, overlooking their internal mechanisms. In this paper, we decompose the LLM-based policy into Internal Layer Policies and Internal Modular Policies via the Transformer's residual stream. Our entropy analysis of internal policy reveals distinct

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies · 相关技术