hZACH-ViT: Curved Latent Geometry for Compact Vision Transformers in Low-Data Medical Imaging 文章

ArXiv CS.CV2026-06-02NEWSen作者: Athanasios Angelakis

摘要

arXiv:2606.00906v1 Announce Type: new Abstract: Compact Vision Transformers are attractive for medical imaging in low-data and resource-constrained settings, but most existing variants assume that Euclidean latent geometry is sufficient for organizing image representations. We introduce hZACH-ViT, a family of curved-geometry extensions of ZACH-ViT, a compact zero-token Vision Transformer that removes positional embeddings and the class token and relies on global average pooling over patch representations. To isolate the role of geometry, we preserve the verified ZACH-ViT backbone and modify only the final representation space and prototype-based classifier head, enabling a controlled comparison between Euclidean, hyperbolic, and spherical latent geometries. We evaluate Poincar\'e, Klein, and spherical hZACH-ViT heads on seven MedMNIST datasets under an identical few-shot protocol with 50 samples per class and five random seeds.