Serving DNNs in Real Time at Datacenter Scale with Project Brainwave 论文

2018IEEE Micro引用 337

Cloud Computing and Resource ManagementSoftware System Performance and ReliabilityIoT and Edge/Fog Computing

Cloud Computing and Resource Management IoT and Edge/Fog Computing Software System Performance and Reliability

作者

摘要

To meet the computational demands required of deep learning, cloud operators are turning toward specialized hardware for improved efficiency and performance. Project Brainwave, Microsofts principal infrastructure for AI serving in real time, accelerates deep neural network (DNN) inferencing in major services such as Bings intelligent search features and Azure. Exploiting distributed model parallelism and pinning over low-latency hardware microservices, Project Brainwave serves state-of-the-art, pre-trained DNN models with high efficiencies at low batch sizes. A high-performance, precision-adaptable FPGA soft processor is at the heart of the system, achieving up to 39.5 teraflops (Tflops) of effective performance at Batch 1 on a state-of-the-art Intel Stratix 10 FPGA.

作者查看全部 (42)

Dan Zhang

Doug Burger

Ritchie Zhao

Phillip Yi Xiao

Serving DNNs in Real Time at Datacenter Scale with Project Brainwave 论文

摘要

作者查看全部 (42)

相关技术

相关事件

相关文章