The Scalasca performance toolset architecture 论文

2010Concurrency and Computation Practice and Experience引用 347
Parallel Computing and Optimization TechniquesDistributed systems and fault toleranceSoftware System Performance and Reliability

摘要

Abstract Scalasca is a performance toolset that has been specifically designed to analyze parallel application execution behavior on large‐scale systems with many thousands of processors. It offers an incremental performance‐analysis procedure that integrates runtime summaries with in‐depth studies of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. Distinctive features are its ability to identify wait states in applications with very large numbers of processes and to combine these with efficiently summarized local measurements. In this article, we review the current toolset architecture, emphasizing its scalable design and the role of the different components in transforming raw measurement data into knowledge of application execution behavior. The scalability and effectiveness of Scalasca are then surveyed from experience measuring and analyzing real‐world applications on a range of computer systems. Copyright © 2010 John Wiley & Sons, Ltd.