Disentanglement-Based Equivariant Learning for Compositional VQA 文章

ArXiv CS.CV2026-06-02NEWSen作者: Zhou Du, Zhaoquan Yuan, Xiao Wu, Changsheng Xu

摘要

arXiv:2606.02168v1 Announce Type: new Abstract: Compositional visual question answering (VQA) represents a challenging yet fundamental task that requires models to comprehend novel combinations of previously learned concepts. The current methods often overlook the disentanglement of underlying concepts and are restricted in terms of their ability to effectively capture the compositional variation mechanism. Moreover, the state-of-the-art techniques depend on additional clues for training, which is not feasible in real-world VQA scenarios. To address these issues, in this paper, we introduce a novel Disentanglement-based EquivAriant Learning (DEAL) framework for compositional VQA, which is guided exclusively by ground-truth answers. In DEAL, we employ causality-inspired interventions to disentangle concepts derived from visual and textual inputs within a re-encoding framework.

Disentanglement-Based Equivariant Learning for Compositional VQA 文章

摘要

相关事件查看全部 (2)

相关公司

相关人物

相关产品

相关技术查看全部 (3)