Adaptive Tokenisation Via Temporal Redundancy Masking And Latent Inpainting 文章

ArXiv CS.CV2026-06-05NEWSen作者: Kevin Dave, Sai Aditya Patkuri, Chhaya Kumar Das, Gouranga Bala, R. Venkatesh Babu, Rajeshkumar SA

摘要

arXiv:2606.06158v1 Announce Type: new Abstract: Adaptive video tokenisation seeks to dynamically allocate token budgets based on the underlying visual complexity of a sequence. Current continuous-regime approaches achieve this via iterative binarised searches or trained neural regressors, while discrete methods often require a full-rate decoder pass to estimate information content. We demonstrate that such computational overheads are not strictly necessary. We show that the latent space of a frozen continuous video tokeniser inherently encodes temporal redundancy that can be exploited directly: spatial positions whose latent representations change minimally between consecutive frames carry near-zero additional information. We introduce a parameter-free adaptive token allocation mechanism that applies a fixed threshold to per-position temporal-L1 differences, identifying and dropping redundant latent positions.

相关公司

暂无数据

相关人物

暂无数据

相关产品

暂无数据