CMAP: Cross-Modal Adaptive Prompting for Multi-Domain Task-Incremental Learning 文章

ArXiv CS.CV2026-05-26NEWSen作者: Sriram Mandalika

摘要

arXiv:2605.25708v1 Announce Type: new Abstract: Multi-domain task-incremental learning requires a model to sequentially acquire knowledge across visually diverse domains without forgetting prior tasks, and without access to task identity at inference. Parameter-efficient methods built on frozen vision-language models have made strong progress, yet all existing approaches rely exclusively on visual features for task routing, confidence estimation, and encoder adaptation, leaving CLIP's cross-modal text embedding space entirely unexploited. We address this gap through three contributions. Text-space task routing replaces visual Gaussian matching with cosine similarity to frozen CLIP text prototypes, giving order-independent routing robust to data scarcity at zero parameter cost. Multi-prototype visual-textual confidence replaces single-Gaussian class modeling with K-means visual prototypes and cross-modal alignment scores under task-calibrated thresholds.

相关公司

暂无数据

相关人物

暂无数据