Inverse Depth Scaling From Most Layers Being Similar 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Inverse Depth Scaling From Most Layers Being Similar arXiv:2602.05970v2 Announce Type: replace-cross Abstract: Neural scaling laws relate loss to model size in large language models (LLMs), yet depth and width may contribute to performance differently, requiring more detailed studies. Here, we quantify how depth affects loss via analysis of LLMs and toy residual networks. We find loss scales inversely proportional to depth in LLMs, probably due to functionally similar layers reducing error thro

Inverse Depth Scaling From Most Layers Being Similar · 相关报道