Language Models Need Sleep 事件
PRODUCT_LAUNCH2026-05-26影响: MEDIUM
Language Models Need Sleep arXiv:2605.26099v1 Announce Type: new Abstract: Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like consolidation mechanism in which a model periodically converts recent context into persistent fast weights before clearing its key-value cache. During sleep, the model performs $N$ offline recurrent passes over the accumulated con
Language Models Need Sleep · 相关报道
相关报道
Do Language Models Need Sleep? Offline Recurrence for Improved Online Inference
ArXiv CS.CL2026-05-28