L$^3$: Large Lookup Layers 事件

PRODUCT_LAUNCH2026-06-04影响: MEDIUM

L$^3$: Large Lookup Layers arXiv:2601.21461v3 Announce Type: replace-cross Abstract: Modern sparse language models typically achieve sparsity through Mixture-of-Experts (MoE) layers, which dynamically route tokens to dense MLP "experts." However, dynamic hard routing has a number of drawbacks, such as potentially poor hardware efficiency and needing auxiliary losses for stable training. In contrast, the tokenizer embedding table, which is natively sparse, largely avoids these issues by selectin