Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving 文章

ArXiv CS.CL2026-06-03NEWSen作者: Chunan Shi, Yilei Chen, Yilin Chen, Xupeng Miao, Bin Cui

Multi-Segment Attention: Enabling Efficient KV-Cache Management for Faster Large Language Model Serving · 相关技术