Distilling Counterfactual Reasoning from Language to Vision: Causal Graph Guided Post-Training for Video Understanding 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

Distilling Counterfactual Reasoning from Language to Vision: Causal Graph Guided Post-Training for Video Understanding arXiv:2511.19923v2 Announce Type: replace Abstract: Vision Language Models (VLMs) have recently shown significant advancements in video understanding, especially in feature alignment, event reasoning, and instruction-following tasks. However, their capability for counterfactual reasoning, inferring alternative outcomes under hypothetical conditions, remains underexplored. This