Learning in the Recurrent State: Gradient Descent with Linear Recurrent Networks 文章

ArXiv CS.AI2026-06-16NEWSen作者: Yudou Tian, Neeraj Mohan Sushma, Harshvardhan Mestha, Nicolo Colombo, David Kappel, Anand Subramoney

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.AI
作者: Yudou Tian, Neeraj Mohan Sushma, Harshvardhan Mestha, Nicolo Colombo, David Kappel, Anand Subramoney
文章类型: NEWS
语言: en
发布日期: 2026-06-16

原文

摘要

arXiv:2410.11687v3 Announce Type: replace-cross Abstract: Linear recurrent networks (LRNNs) offer linear-time sequence modeling, but standard recurrent updates do not directly expose the supervised products needed for in-context gradient descent. We propose a sufficient constructive inductive bias for LRNNs: equip a diagonal recurrent state with multiplicative readout and a short sliding-window cross-product self-attention update. The resulting architecture, Gradient-based Recurrent In-context Learner (GRIL), can implement minibatch gradient descent on a task-specific linear predictor during a single forward pass. The same design extends to multi-step updates and cross-entropy classification, with a limited MLP-based extension to non-linear regression. Empirically, trained GRILs recover the behavior and parameters predicted by the construction on synthetic ICL tasks, and the same architectural bias yields useful performance on Long Range Arena and language modelling.

Learning in the Recurrent State: Gradient Descent with Linear Recurrent Networks 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (1)