Rethinking Layer Redundancy: Calibration Matters More Than Search in LLM Depth Pruning 文章

ArXiv CS.CL2026-05-28NEWSen作者: Minkyu Kim, Vincent-Daniel Yun, Youngrae Kim, Suin Cho, Woosang Lim, Sunwoo Lee

摘要

arXiv:2604.24938v3 Announce Type: replace-cross Abstract: Depth pruning improves the inference efficiency of large language models by removing Transformer blocks. Prior work typically treats layer redundancy as an inherent structural property of pretrained networks, emphasizing importance criteria and search algorithms to identify removable layers. In this study, we empirically investigate depth pruning from a functional perspective. Evaluating representative LLM families across diverse calibration configurations and multiple search algorithms, we show that different configurations produce different pruning patterns. Furthermore, under a fixed calibration configuration, complex search algorithms yield marginal performance improvements over simple one-shot methods, converging to similar pruned subsets.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据