The OpenMP Implementation of NAS Parallel Benchmarks and its Performance 论文

2013引用 479
Parallel Computing and Optimization TechniquesAdvanced Data Storage TechnologiesDistributed and Parallel Computing Systems

摘要

As the new ccNUMA architecture became popular in recent years, parallel programming with compiler directives on these machines has evolved to accommodate new needs. In this study, we examine the effectiveness of OpenMP directives for parallelizing the NAS Parallel Benchmarks. Implementation details will be discussed and performance will be compared with the MPI implementation. We have demonstrated that OpenMP can achieve very good results for parallelization on a shared memory system, but effective use of memory and cache is very important.