Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Representation over Routing: Diagnosing Temporal Routing Pathologies in Multi-Timescale PPO arXiv:2604.13517v4 Announce Type: replace-cross Abstract: Temporal credit assignment in reinforcement learning is often approached by introducing value estimates at multiple discount factors. A natural next step is to let the actor dynamically route among these temporal heads, using either differentiable attention or heuristic uncertainty weights. This paper argues that such routing can create a numerica