IAR2: Improving Autoregressive Visual Generation with Semantic-Detail Associated Token Prediction 文章

ArXiv CS.CV2026-05-28NEWSen作者: Ran Yi, Teng Hu, Zihan Su, Jiangning Zhang, Lizhuang Ma

摘要

arXiv:2510.06928v2 Announce Type: replace Abstract: Autoregressive models have emerged as a powerful paradigm for visual content creation, but often overlook the intrinsic structural properties of visual data. Our prior work, IAR, initiated a direction to address this by reorganizing the visual codebook based on embedding similarity, thereby improving generation robustness. However, it is constrained by the rigidity of pre-trained codebooks and the inaccuracies of hard, uniform clustering. To overcome these limitations, we propose IAR2, an advanced autoregressive framework that enables a hierarchical semantic-detail synthesis process. At the core of IAR2 is a novel Semantic-Detail Associated Dual Codebook, which decouples image representations into a semantic codebook for global semantic information and a detail codebook for fine-grained refinements. It expands the quantization capacity from a linear to a polynomial scale, significantly enhancing expressiveness.