Decoupling Semantics from Distortions: Multi-Scale Two-Stream Vision-Language Alignment for AI-Generated Image Quality Assessment 文章

ArXiv CS.CV2026-06-16NEWSen作者: Zijie Meng

详细信息

来源站点
ArXiv CS.CV
作者
Zijie Meng
文章类型
NEWS
语言
en
发布日期
2026-06-16

摘要

arXiv:2606.16799v1 Announce Type: new Abstract: Existing vision-language model (VLM)-based AI-generated image quality assessment (AIGIQA) methods suffer from a fundamental semantic-distortion dimensional conflict: monolithic representations optimized for semantic discrimination inherently entangle compositional understanding with low-level perceptual sensitivity, rendering them blind to fine-grained quality degradations. We introduce MST-CLIPIQA, a multi-scale two-stream framework that achieves hierarchical vision-language alignment through explicit representational decoupling. Our architecture leverages dual CLIP encoders with complementary patch granularities: coarse-grained streams capture global semantic coherence while fine-grained streams preserve textural signatures and artifact patterns.

相关事件

暂无数据

相关公司

暂无数据

相关人物

暂无数据