Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition 事件

PRODUCT_LAUNCH2026-05-26影响: MEDIUM

Depth Registers Unlock W4A4 on SwiGLU: A Reader/Generator Decomposition arXiv:2604.18128v2 Announce Type: replace Abstract: We study post-training W4A4 quantization in a controlled 300M-parameter SwiGLU decoder-only language model trained on 5B tokens of FineWeb-Edu, and ask which input-activation sites dominate the error. Naive round-to-nearest W4A4 collapses validation perplexity from FP16 23.6 to 1727. A simple residual-axis training-time intervention -- Depth Registers with a register-magni