River-LLM: Large Language Model Seamless Exit Based on KV Share 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

River-LLM: Large Language Model Seamless Exit Based on KV Share arXiv:2604.18396v3 Announce Type: replace Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across diverse domains but are increasingly constrained by high inference latency. Early Exit has emerged as a promising solution to accelerate inference by dynamically bypassing redundant layers. However, in decoder-only architectures, the efficiency of Early Exit is severely bottlenecked by the KV Cache Absen