APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

APB-V: Accelerating Long-Video Understanding via Sequence-Parallelism-aware Approximate Attention arXiv:2601.21444v2 Announce Type: replace Abstract: The efficiency of long-video inference remains a critical bottleneck, mainly due to the dense computation in the prefill stage of Large Multimodal Models (LMMs). Existing methods either compress visual embeddings or apply sparse attention on a single GPU, yielding limited acceleration or degraded performance and restricting LMMs from handling long