Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders 事件

PRODUCT_LAUNCH2026-06-02影响: MEDIUM

Scaling Parallel Sequence Models to Foundation-Scale Vision Encoders arXiv:2606.00746v1 Announce Type: new Abstract: Vision foundation models are bottlenecked by the quadratic cost of self-attention, which limits usable resolution and increases the cost of large-scale pretraining. Subquadratic alternatives such as linear attention and state-space models reduce this cost, but often serialize images into 1D token streams and weaken the 2D spatial structure important for vision. Generalized Spatia