AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference 文章

ArXiv CS.CL2026-05-27NEWSen作者: Kuan-Wei Lu, Ding-Yong Hong, Pangfeng Liu, Jan-Jan Wu

摘要

arXiv:2512.11280v2 Announce Type: replace Abstract: Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding mitigates this issue by leveraging a smaller draft model to predict candidate tokens, which are then verified by a larger target model. However, existing approaches often require additional training, extensive hyperparameter tuning, or prior analysis of models and tasks before deployment. In this paper, we propose Adaptive Speculative Decoding (AdaSD), a hyperparameter-free decoding scheme that dynamically adjusts generation length and acceptance criteria during inference. AdaSD introduces two adaptive components: one to determine when to stop candidate token generation and the other to decide token acceptance, updated in real time based on token entropy and Jensen-Shannon distance.

AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (6)

相关人物

相关产品查看全部 (9)

相关技术查看全部 (30)