ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference 事件
PRODUCT_LAUNCH2026-05-28影响: MEDIUM
ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference arXiv:2505.19342v2 Announce Type: replace-cross Abstract: Multi-device inference can reduce Transformer latency by parallelizing computation. However, existing methods require high inter-device bandwidth, making them impractical for bandwidth-constrained environments. We present ASTRA, a communication-efficient framework that integrates sequence parallelism with mixed-precision attention, where non-local token em
相关产品查看全部 (10)
相关报道查看全部 (1)
ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference
ArXiv CS.AI2026-05-28