Automatic performance tuning of sparse matrix kernels 论文

2003引用 229

Parallel Computing and Optimization TechniquesDistributed and Parallel Computing SystemsTensor decomposition and applications

Parallel Computing and Optimization Techniques Distributed and Parallel Computing Systems Tensor decomposition and applications

关系图谱

作者

摘要

Automatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc Doctor of Philosophy in Computer Science University of California, Berkeley Professor James W. Demmel, Chair This dissertation presents an automated system to generate highly e#cient, platformadapted implementations of sparse matrix kernels. These computational kernels lie at the heart of diverse applications in scientific computing, engineering, economic modeling, and information retrieval, to name a few. Informally, sparse kernels are computational operations on matrices whose entries are mostly zero, so that operations with and storage of these zero elements may be eliminated. The challenge in developing high-performance implementations of such kernels is choosing the data structure and code that best exploits the structural properties of the matrix---generally unknown until application run-time---for high-performance on the underlying machine architecture (e.g., memory hierarchy configuration and CPU pipeline structure). We show that conventional implementations of important sparse kernels like sparse matrix-vector multiply (SpMV) have historically run at 10% or less of peak machine speed on cache-based superscalar architectures. Our implementations of SpMV, automatically tuned using a methodology based on empirical-search, can by contrast achieve up to 31% of peak machine speed, and can be up to 4 faster.

作者查看全部 (2)

James Demmel

Richard Vuduc

Automatic performance tuning of sparse matrix kernels 论文

摘要

作者查看全部 (2)

相关技术查看全部 (1)

相关事件

相关文章