O-MARC: Omni Memory-Augmented Compression Distillation for Efficient Video Understanding 事件
PRODUCT_LAUNCH2026-05-27影响: MEDIUM
O-MARC: Omni Memory-Augmented Compression Distillation for Efficient Video Understanding arXiv:2605.26584v1 Announce Type: new Abstract: Omnimodal large language models enable unified audio video understanding, but long joint token sequences make inference costly, and existing benchmarks do not fully isolate audio visual association in noisy user generated videos. We introduce UGC-AVQA, a public UGC benchmark with 1,000 videos and 4,816 QA pairs, where an audio removal test ensures that benchma