SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning 文章

ArXiv CS.AI2026-05-26NEWSen作者: Zhongling Xu, Shunan Zheng, Wei Wang

摘要

arXiv:2605.25424v1 Announce Type: cross Abstract: Existing LLM routing frameworks treat queries as independent events, neglecting the sequential nature of real-world user sessions constrained by global computational budgets. This mismatch inevitably leads to budget bankruptcy: myopic routing policies exhaust resources on early interactions, forcing subsequent and often more complex queries onto inadequate models. We introduce SeqRoute, a framework that formulates multi-turn routing as a finite-horizon Markov Decision Process and solves it via offline reinforcement learning. By incorporating the remaining budget into the state space and training with Conservative Q-Learning (CQL), SeqRoute learns delayed gratification to strategically preserve resources for high-stakes turns later in the session. To overcome data starvation, we propose Hindsight Budget Relabeling (HBR).

SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (3)

相关人物

相关产品查看全部 (8)

相关技术查看全部 (19)