The Price of Anarchy in Disaggregated Inference 文章

ArXiv CS.AI2026-06-17NEWSen作者: Athos Georgiou (NCA)

详细信息

来源站点: ArXiv CS.AI
作者: Athos Georgiou (NCA)
文章类型: NEWS
语言: en
发布日期: 2026-06-17

摘要

arXiv:2606.17081v1 Announce Type: cross Abstract: Disaggregated inference architectures physically separate prefill and decode phases onto distinct GPU pools, creating competing "agents" that share a fixed hardware budget. We provide, to our knowledge, the first formal game-theoretic analysis of this architecture, using NVIDIA Dynamo as a concrete case study. We model disaggregated serving as three coupled games: a two-player resource game between prefill and decode pools, a selfish caching game over the hierarchical KV cache, and a congestion game with positive externalities for request routing. We empirically validate the latter two; the P/D resource game is treated analytically (Section 9.2). We characterize how GPU saturation induces regime transitions that shift the game's payoff structure: below saturation, selfish behavior has bounded Price of Anarchy (PoA); at saturation, superlinear latency and cache externalities drive our empirical estimator PoA-hat (defined in Section 6.

The Price of Anarchy in Disaggregated Inference 文章

详细信息

摘要

相关事件

相关公司查看全部 (1)

相关人物

相关产品查看全部 (7)

相关技术查看全部 (7)