Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes 文章

ArXiv CS.CL2026-05-27NEWSen作者: Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou

摘要

arXiv:2605.06152v3 Announce Type: replace-cross Abstract: Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. As training enters a high-confidence stage, the difference between the correct-class logit and the other logits may exceed the absorption-error threshold. Then during backpropagation, the gradient of the correct class is rounded exactly to zero, while the gradients of the incorrect classes remain nonzero. This breaks the zero-sum constraint of gradients across classes and introduces a systematic drift in the parameter update of the classifier layer.

Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes 文章

摘要

相关事件查看全部 (1)

相关公司查看全部 (4)

相关人物

相关产品查看全部 (7)

相关技术查看全部 (17)