OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference 文章

ArXiv CS.CL2026-06-01NEWSen作者: Yuzhe Gu, Xiyu Liang, Jiaojiao Zhao, Enmao Diao

OBCache: Optimal Brain KV Cache Pruning for Efficient Long-Context LLM Inference · 相关技术