A high performance FPGA-based accelerator for large-scale convolutional neural networks 论文

2016引用 310

Advanced Neural Network ApplicationsCCD and CMOS Imaging SensorsAdvanced Memory and Neural Computing

人工智能 Advanced Neural Network Applications CCD and CMOS Imaging Sensors Advanced Memory and Neural Computing

作者

摘要

In recent years, convolutional neural networks (CNNs) based machine learning algorithms have been widely applied in computer vision applications. However, for large-scale CNNs, the computation-intensive, memory-intensive and resource-consuming features have brought many challenges to CNN implementations. This work proposes an end-to-end FPGA-based CNN accelerator with all the layers mapped on one chip so that different layers can work concurrently in a pipelined structure to increase the throughput. A methodology which can find the optimized parallelism strategy for each layer is proposed to achieve high throughput and high resource utilization. In addition, a batch-based computing method is implemented and applied on fully connected layers (FC layers) to increase the memory bandwidth utilization due to the memory-intensive feature. Further, by applying two different computing patterns on FC layers, the required on-chip buffers can be reduced significantly. As a case study, a state-of-the-art large-scale CNN, AlexNet, is implemented on Xilinx VC709. It can achieve a peak performance of 565.94 GOP/s and 391 FPS under 156MHz clock frequency which outperforms previous approaches.

作者查看全部 (6)

Wei Cao

Lingli Wang

Xuegong Zhou

Jiao Li

A high performance FPGA-based accelerator for large-scale convolutional neural networks 论文

详细信息

摘要

作者查看全部 (6)

相关技术查看全部 (2)

相关事件

相关文章