Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics 文章

ArXiv CS.AI2026-06-02NEWSen作者: Bole Ma, Jan Eitzinger, Harald K\"ostler, Gerhard Wellein

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics · 相关技术