Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Muon in Vision Transformers: Optimizer-Recipe Interactions and Gradient Spectra arXiv:2605.24770v1 Announce Type: cross Abstract: Muon is a recently developed matrix-aware optimizer that has shown strong results in transformer training, but its behavior in vision transformers (ViTs) is not yet well understood. We study Muon for ViT training, largely on ImageNet-100 and Pl@ntNet-300K, comparing against AdamW under standard vision recipes involving mixup, cutmix, smoothing, and random augmentatio