Ideas in Inference-time Scaling can Benefit Generative Pre-training Algorithms 文章

ArXiv CS.AI2026-06-02NEWSen作者: Jiaming Song, Linqi Zhou

摘要

arXiv:2503.07154v3 Announce Type: replace-cross Abstract: Generative pre-training is often framed through a false dichotomy between autoregressive models for discrete signals and diffusion models for continuous signals. We argue that the dichotomy is false because it conflates model family, data representation, training objective, and inference procedure. Autoregression is an inference procedure that expands a sequence through normalized conditional draws, while diffusion is a refinement procedure that repeatedly revises an existing state. The more useful contrast is therefore not autoregressive versus diffusion, but discrete tokens learned with cross-entropy versus continuous tokens learned with diffusion-style objectives, together with the inference algorithms used to sample from them. From this perspective, algorithmic progress should prioritize inference-time efficiency along two axes: sequence expansion and state refinement.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据