Beyond the Literal: Decomposing Pragmatic Intent in Multimodal Meme Understanding 事件
PRODUCT_LAUNCH2026-06-03影响: MEDIUM
Beyond the Literal: Decomposing Pragmatic Intent in Multimodal Meme Understanding arXiv:2606.03604v1 Announce Type: new Abstract: When asked what a meme or sarcastic post means, Large Vision Language Models (LVLMs) tend to describe what the image shows rather than what the author is trying to communicate. Standard instruction tuning entangles a post's literal content with its pragmatic meaning, letting surface-level details contaminate the final response. We reframe meme understanding as a prob