ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference 事件

PRODUCT_LAUNCH2026-05-28影响: MEDIUM

ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference arXiv:2505.19342v2 Announce Type: replace-cross Abstract: Multi-device inference can reduce Transformer latency by parallelizing computation. However, existing methods require high inter-device bandwidth, making them impractical for bandwidth-constrained environments. We present ASTRA, a communication-efficient framework that integrates sequence parallelism with mixed-precision attention, where non-local token em

ASTRA: Communication-Efficient Acceleration for Multi-Device Transformer Inference · 相关技术