Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
Two Speeds of Learning: A Representation-Readout Decomposition of Grokking and Double Descent arXiv:2605.27078v1 Announce Type: cross Abstract: Training loss and accuracy are the standard signals used to monitor generalization during deep neural network training. Two well-documented phenomena complicate this picture: in grokking, train loss falls rapidly while test performance improves abruptly only after a long delay; in epoch-wise double descent, train loss decreases monotonically while test