v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound 事件

OPEN_SOURCE2026-06-02影响: MEDIUM

v-HUB: A Benchmark for Video Humor Understanding from Vision and Sound arXiv:2509.25773v3 Announce Type: replace Abstract: AI models capable of comprehending humor hold real-world promise -- for example, enhancing engagement in human-machine interactions. To gauge and diagnose the capacity of multimodal large language models (MLLMs) for humor understanding, we introduce v-HUB, a novel video humor understanding benchmark. v-HUB comprises a curated collection of non-verbal short videos, reflectin