PEEK: Picking Essential frames via Efficient Knowledge distillation 事件

PRODUCT_LAUNCH2026-06-01影响: MEDIUM

PEEK: Picking Essential frames via Efficient Knowledge distillation arXiv:2605.31029v1 Announce Type: new Abstract: Video-language models can process only a limited number of frames, making frame selection a key bottleneck for efficient video captioning. Most captioning pipelines still rely on uniform sampling, which is computationally cheap but agnostic to visual content. Adaptive frame sampling has recently emerged as a promising approach for selecting the most informative frames from a video