Rubric-Guided Process Reward for Stepwise Model Routing 事件
PRODUCT_LAUNCH2026-05-29影响: MEDIUM
Rubric-Guided Process Reward for Stepwise Model Routing arXiv:2605.29310v1 Announce Type: cross Abstract: Stepwise model routing improves the efficiency of Large Reasoning Models (LRMs) by assigning each reasoning step to a suitable model. Recent methods formulate routing as a sequential decision process and train the router with reinforcement learning. However, although they model routing as a process, they still supervise the router with outcome rewards. Such rewards only reflect final answer
相关产品查看全部 (10)
相关报道查看全部 (1)
Rubric-Guided Process Reward for Stepwise Model Routing
ArXiv CS.CL2026-05-29