Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior arXiv:2605.26797v1 Announce Type: cross Abstract: We study Latent Recurrent Transformer (LRT), a lightweight augmentation of autoregressive transformers that reuses a high-level source-layer hidden state from the previous token as recurrent memory for the next token. Because this source state is already computed during ordinary decoding, LRT adds a cross-layer recurrent latent pathway across positi
相关人物
暂无数据