Automatic performance tuning of sparse matrix kernels 论文
摘要
Automatic Performance Tuning of Sparse Matrix Kernels by Richard Wilson Vuduc Doctor of Philosophy in Computer Science University of California, Berkeley Professor James W. Demmel, Chair This dissertation presents an automated system to generate highly e#cient, platformadapted implementations of sparse matrix kernels. These computational kernels lie at the heart of diverse applications in scientific computing, engineering, economic modeling, and information retrieval, to name a few. Informally, sparse kernels are computational operations on matrices whose entries are mostly zero, so that operations with and storage of these zero elements may be eliminated. The challenge in developing high-performance implementations of such kernels is choosing the data structure and code that best exploits the structural properties of the matrix---generally unknown until application run-time---for high-performance on the underlying machine architecture (e.g., memory hierarchy configuration and CPU pipeline structure). We show that conventional implementations of important sparse kernels like sparse matrix-vector multiply (SpMV) have historically run at 10% or less of peak machine speed on cache-based superscalar architectures. Our implementations of SpMV, automatically tuned using a methodology based on empirical-search, can by contrast achieve up to 31% of peak machine speed, and can be up to 4 faster.