KromHC: Manifold-Constrained Hyper-Connections with Kronecker-Product Residual Matrices 文章

ArXiv CS.CL2026-06-02NEWSen作者: Wuyang Zhou, Yuxuan Gu, Giorgos Iacovides, Danilo Mandic

摘要

arXiv:2601.21579v2 Announce Type: replace Abstract: The success of Hyper-Connections (HC) in neural networks (NN) has also highlighted issues related to training instability and restricted scalability. The Manifold-Constrained Hyper-Connections (mHC) mitigate these challenges by projecting the residual connection space onto a Birkhoff polytope, however, it faces two issues: 1) its iterative Sinkhorn-Knopp (SK) algorithm does not always yield exactly doubly stochastic residual matrices; 2) mHC incurs a prohibitive $O(n^3C)$ parameter complexity with $n$ as the width of the residual stream and $C$ as the feature dimension. The recently proposed mHC-lite reparametrizes the residual matrix via the Birkhoff-von-Neumann theorem to guarantee double stochasticity, but also faces a factorial explosion in its parameter complexity, $O \left( nC \cdot n! \right)$.