HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces 文章

ArXiv CS.AI2026-06-02NEWSen作者: Nasib Ullah, Jinbin Zhang, Jean Lucien Randrianantenaina, Erik Schultheis, Rohit Babbar

详细信息

来源站点: ArXiv CS.AI
作者: Nasib Ullah, Jinbin Zhang, Jean Lucien Randrianantenaina, Erik Schultheis, Rohit Babbar
文章类型: NEWS
语言: en
发布日期: 2026-06-02

摘要

arXiv:2606.01117v1 Announce Type: cross Abstract: Extreme multi-label classification (XMC) involves learning models over large output spaces with millions of labels, making the output layer a memory-compute bottleneck. While sparsity-based methods reduce arithmetic complexity, they often fail to yield proportional speedups due to irregular memory access, poor hardware utilization, or reliance on auxiliary architectural components in long-tailed regimes. We introduce group-shared fixed fan-in sparsity, a semi-structured output-layer design in which semantically related labels share a sparse input pattern while retaining independent weights. This grouping introduces a task-aligned inductive bias -- encouraging related labels to share feature subsets -- while reducing index memory overhead, increasing feature reuse across labels, and enabling efficient GPU execution via custom CUDA kernels that leverage modern accelerator primitives.

HASTE: Hardware-Aware Dynamic Sparse Training for Large Output Spaces 文章

详细信息

摘要

相关事件

相关公司

相关人物

相关产品

相关技术查看全部 (2)