Inference Time Context Sparsity: Illusion or Opportunity? 文章

ArXiv CS.AI2026-05-26NEWSen作者: Sahil Joshi, Prithvi Dixit, Agniva Chowdhury, Anshumali Shrivastava, Joseph E. Gonzalez, Ion Stoica, Kumar Krishna Agrawal, Aditya Desai

查看原文 →

关系图谱

摘要

arXiv:2605.24168v1 Announce Type: new Abstract: Sparsity has long been a central theme in LLM efficiency, but its role in context processing remains unresolved. As LLM workloads shift toward longer contexts and agentic interactions, the compute and memory bottlenecks of attention become increasingly critical, raising the question of whether these constraints are fundamental. Our position is that these constraints are artificial and unnecessary, and that the future of LLM inference lies in extreme but principled sparsity along the context dimension. This position is supported by several strands of empirical and theoretical evidence. First, we find the insistence on dense attention unreasonable, since in a long context a query effectively projects O(N) attention information into a hidden space of dimension d << N, making the process inherently lossy.

Inference Time Context Sparsity: Illusion or Opportunity? 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (3)

相关人物

相关产品查看全部 (10)

相关技术查看全部 (22)