Residual Connections Harm Generative Representation Learning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

摘要

arXiv:2404.10947v5 Announce Type: replace Abstract: We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers.