Rubric-Guided Process Reward for Stepwise Model Routing 事件

PRODUCT_LAUNCH2026-05-29影响: MEDIUM

Rubric-Guided Process Reward for Stepwise Model Routing arXiv:2605.29310v1 Announce Type: cross Abstract: Stepwise model routing improves the efficiency of Large Reasoning Models (LRMs) by assigning each reasoning step to a suitable model. Recent methods formulate routing as a sequential decision process and train the router with reinforcement learning. However, although they model routing as a process, they still supervise the router with outcome rewards. Such rewards only reflect final answer