Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs arXiv:2605.27894v1 Announce Type: new Abstract: Video-Language Models (VLMs) have demonstrated impressive multi-modal reasoning capabilities across diverse computer vision applications. However, these VLMs are task-specific and assume that both video and language inputs are complete. However, real-world VLM applications might face challenges due to deactivated sensors (e.g., cameras are unavailable due to data privacy), y

Towards Unified Vision-Language Models with Incomplete Multi-Modal Inputs · 相关技术