Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation 文章

ArXiv CS.CV2026-06-19NEWSen作者: Kaili Wang, Martin Dimitrievski, Jose Maria Salvador, Ben Stoffelen, David Van Hamme, Lore Goetschalckx

查看原文 →

关系图谱

详细信息

来源站点: ArXiv CS.CV
作者: Kaili Wang, Martin Dimitrievski, Jose Maria Salvador, Ben Stoffelen, David Van Hamme, Lore Goetschalckx
文章类型: NEWS
语言: en
发布日期: 2026-06-19

原文

摘要

arXiv:2606.19961v1 Announce Type: new Abstract: Latent diffusion models (LDMs) enable efficient image-to-image translation but discard fine spatial details during compression, degrading downstream perception tasks. We identify two bottlenecks: the autoencoder, which loses spatial information, and the conditioning pathway, which further degrades the source signal through naive downsampling. We propose two lightweight, backbone-agnostic fixes: a Source-Conditioned Autoencoder (SCAE) that injects high-resolution source features into the decoder via skip connections, and a Learnable Guidance Encoder (LGE) that replaces naive downsampling with a learned conditioning signal. Evaluated on RGB-to-SWIR translation for driving scenes with two denoiser backbones (U-Net and DiT), our approach improves detection mAP by up to 2x over the latent diffusion baseline, with up to 3.4x gains on small objects (COCO-small, <32^2 px^2), while achieving state-of-the-art FID.

Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品查看全部 (2)

相关技术查看全部 (6)