LLM Compression by Block Removal with Constrained Binary Optimization 文章

ArXiv CS.CL2026-06-18NEWSen作者: David Jansen, Roman Rausch, Ali Hashemi, David Montero, Rom\'an Or\'us

详细信息

来源站点
ArXiv CS.CL
作者
David Jansen, Roman Rausch, Ali Hashemi, David Montero, Rom\'an Or\'us
文章类型
NEWS
语言
en
发布日期
2026-06-18

摘要

arXiv:2602.00161v2 Announce Type: replace-cross Abstract: In this paper, we formulate the compression of large language models (LLMs) by optimally deleting transformer blocks (``block removal'') as a constrained binary optimization (CBO) problem that can be mapped to a physical system (Ising glass), whose energies are a strong proxy for downstream model performance. This formulation enables an efficient ranking of a large number of candidate block-removal configurations yielding many high-quality, non-trivial solutions beyond those only removing consecutive regions. Our method performs strongly in the deep compression regime, such as for 50% compression of Llama-3.3-70B-Instruct, where we achieve an almost 23 percentage point increase on the MMLU benchmark compared to other state-of-the-art (SOTA) block-removal methods. For lighter compression, it performs on par with those methods across several benchmarks for Llama-3.