ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference 文章

ArXiv CS.CL2026-05-26NEWSen作者: Haojie Ouyang, Jianwei Lv, Lei Ren, Chen Wei, Xiaojie Wang, Fangxiang Feng

摘要

arXiv:2510.02361v2 Announce Type: replace Abstract: Transformer-based large models excel in natural language processing and computer vision, but face severe computational inefficiencies due to the self-attention's quadratic complexity with input tokens. Recently, researchers have proposed a series of methods based on block selection and compression to alleviate this problem, but they either have issues with semantic incompleteness or poor training-inference efficiency. To comprehensively address these challenges, we propose ChunkLLM, a lightweight and pluggable training framework. Specifically, we introduce two components: QK Adapter (Q-Adapter and K-Adapter) and Chunk Adapter. The former is attached to each Transformer layer, serving dual purposes of feature compression and chunk attention acquisition. The latter operates at the bottommost layer of the model, functioning to detect chunk boundaries by leveraging contextual semantic information.

ChunkLLM: A Lightweight Pluggable Framework for Accelerating LLMs Inference 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品查看全部 (1)

相关技术查看全部 (3)