Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss 文章

ArXiv CS.AI2026-06-06NEWSen作者: Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni, Max Simchowitz

详细信息

来源站点: ArXiv CS.AI
作者: Thomas T. Zhang, Alok Shah, Yifei Zhang, Vincent Zhang, Nikolai Matni, Max Simchowitz
文章类型: NEWS
语言: en
发布日期: 2026-06-06

摘要

arXiv:2606.06418v1 Announce Type: cross Abstract: Many modern applications of deep learning involve training a neural network via a one-step prediction loss (e.g., $L^2$ regression, cross-entropy), but deploy the network by rolling out along its own predictions. Key examples include autoregressive language modeling, flow-based generative modeling, and robot policy learning. It is well-documented that these settings induce a phenomenon we call test-time feedback (TTF): the mismatch between the training/validation loss and downstream metrics of interest, such as task success rate and generation quality, which grows with task length. While data curation, architecture, and objective design have been proposed to combat train-test shift in TTF settings, this paper proposes optimization as a new design axis to mitigate error accumulation. Specifically, we introduce a new optimization paradigm called double-preconditioning (DoPr) uniquely tailored to the challenges of TTF.

Double Preconditioning (DoPr): Optimization for Test-Time Performance, not Validation Loss 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (2)