A Top-Down method for performance analysis and counters architecture 论文

2014引用 310
Parallel Computing and Optimization TechniquesRadiation Effects in ElectronicsInterconnection Networks and Systems

摘要

Optimizing an application's performance for a given microarchitecture has become painfully difficult. Increasing microarchitecture complexity, workload diversity, and the unmanageable volume of data produced by performance tools increase the optimization challenges. At the same time resource and time constraints get tougher with recently emerged segments. This further calls for accurate and prompt analysis methods. The insights from this method guide a proposal for a novel performance counters architecture that can determine the true bottlenecks of a general out-of-order processor. Unlike other approaches, our analysis method is low-cost and already featured in in-production systems - it requires just eight simple new performance events to be added to a traditional PMU. It is comprehensive - no restriction to predefined set of performance issues. It accounts for granular bottlenecks in super-scalar cores, missed by earlier approaches.