Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies 事件
PRODUCT_LAUNCH2026-06-01影响: MEDIUM
Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies arXiv:2512.19673v3 Announce Type: replace-cross Abstract: Existing reinforcement learning (RL) approaches treat large language models (LLMs) as a unified policy, overlooking their internal mechanisms. In this paper, we decompose the LLM-based policy into Internal Layer Policies and Internal Modular Policies via the Transformer's residual stream. Our entropy analysis of internal policy reveals distinct