Disentanglement-Based Equivariant Learning for Compositional VQA 文章

ArXiv CS.CV2026-06-02NEWSen作者: Zhou Du, Zhaoquan Yuan, Xiao Wu, Changsheng Xu

摘要

arXiv:2606.02168v1 Announce Type: new Abstract: Compositional visual question answering (VQA) represents a challenging yet fundamental task that requires models to comprehend novel combinations of previously learned concepts. The current methods often overlook the disentanglement of underlying concepts and are restricted in terms of their ability to effectively capture the compositional variation mechanism. Moreover, the state-of-the-art techniques depend on additional clues for training, which is not feasible in real-world VQA scenarios. To address these issues, in this paper, we introduce a novel Disentanglement-based EquivAriant Learning (DEAL) framework for compositional VQA, which is guided exclusively by ground-truth answers. In DEAL, we employ causality-inspired interventions to disentangle concepts derived from visual and textual inputs within a re-encoding framework.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据