High-performance implementation of the level-3 BLAS 论文

2008ACM Transactions on Mathematical Software引用 314

Parallel Computing and Optimization TechniquesInterconnection Networks and SystemsAdvanced Data Storage Technologies

Parallel Computing and Optimization Techniques Advanced Data Storage Technologies Interconnection Networks and Systems

作者

摘要

A simple but highly effective approach for transforming high-performance implementations on cache-based architectures of matrix-matrix multiplication into implementations of other commonly used matrix-matrix computations (the level-3 BLAS) is presented. Exceptional performance is demonstrated on various architectures.

作者查看全部 (2)

Robert A. Geijn

Kazushige Goto

High-performance implementation of the level-3 BLAS 论文

摘要

作者查看全部 (2)

相关技术查看全部 (2)

相关事件

相关文章