Lodestar: An Online-Learning LLM Inference Router 事件
PRODUCT_LAUNCH2026-06-02影响: MEDIUM
Lodestar: An Online-Learning LLM Inference Router arXiv:2606.00946v1 Announce Type: cross Abstract: Efficiently serving large language model (LLM) inference tasks is crucial both for user-perceived latency such as time-to-first-token (TTFT) and for GPU utilization. However, LLM request routing, that is, assigning each inference request to a GPU instance, is particularly challenging: execution is highly input-dependent; batching and KV-cache reuse create strong cross-request coupling; and latenc