FocusDiT: Masking Queries in Diffusion Transformers for Fine-grained Image Generation 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

FocusDiT: Masking Queries in Diffusion Transformers for Fine-grained Image Generation arXiv:2606.02090v1 Announce Type: new Abstract: Diffusion transformer (DiT) has been widely adopted in the generative diffusion field, advancing the denoising of query tokens through attention and Feed-Forward (\text{FFN}) layers. FFN actually acts as the key-value vocabulary for decoding visual contents where the value embeds the visual semantical knowledge. We present that focusing on critical query tokens c