DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding 文章
ArXiv CS.CV2026-05-27NEWSen作者: Peng Zhang, Guanghao Zhang, Wanggui He, Longxiang Zhang, Mushui Liu, Yan Xia, Zhenhao Peng, Weilong Dai, Jinlong Liu, Haobing Tang, Le Zhang, Hao Jiang, Pipei Huang
DynFrame: Adaptive Reasoning-Driven Multimodal Framework with Dynamic Frame Augmentation for Complex Video Understanding · 相关技术
相关技术
Multimodal Large Language Models (MLLMs)ODELLMlanguage model多模态evalaugmentationadaptive reasoningUCTTokenStraight-Through EstimatorStanSULReferring expression comprehension (REC)Narrative Abstraction BenchmarkLMMlarge language modelsGrouped Memorization EvaluationGranular Alignment ParadigmForFFIEffort Metric AttentionDiTDAPTCISARGANN