Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior 事件

PRODUCT_LAUNCH2026-05-27影响: MEDIUM

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior arXiv:2605.26797v1 Announce Type: cross Abstract: We study Latent Recurrent Transformer (LRT), a lightweight augmentation of autoregressive transformers that reuses a high-level source-layer hidden state from the previous token as recurrent memory for the next token. Because this source state is already computed during ordinary decoding, LRT adds a cross-layer recurrent latent pathway across positi

Latent Recurrent Transformer: Architecture Exploration, Training Strategies, and Scaling Behavior · 相关报道