In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend her thesis
Today's high performance computer vendors must take power efficiency into account when designing their hardware. Multicore processors are a widespread response to this concern. They are the basis for recent SMPs and clusters that also attempt to offer lower power-consumption and higher performance per watt and per dollar. The SiCortex distributed memory platform with multicore SMP nodes is an extreme example of this trend, offering parallel compute performance with particularly low power consumption. Research into suitable parallel programming strategies for exploiting these new power-efficient SMP clusters is essential. The hybrid MPI + OpenMP programming model is considered to be one of the models that is a natural fit for the cluster-of-SMPs architecture. Some previous work has been performed that studies the performance of the hybrid MPI + OpenMP model on SMP clusters with varied results. This dissertation presents a performance study of hybrid MPI + OpenMP programming model on the new large-scale, power-efficient SMP cluster from SiCortex in the following aspects: 1) Performance and overheads study of OpenMP constructs using the EPCC microbenchmarks; 2) Scalability study of the hybrid MPI + OpenMP programming model using NAS Multi-zone hybrid version benchmarks (BT-MZ, LU-MZ and SP-MZ); 3) Performance comparison of the hybrid model and the pure MPI model using two profiling and tracing tools(Mpipex and Tauex). Experimental results demonstrate that OpenMP overhead show competitive intrinsic performance on such low frequency architectures compared to another higher frequency SMP machine. Also, the hybrid programming model shows reasonably good scalability and significant performance benefit for applications with load-balancing problems, compared to pure MPI. This dissertation also demonstrates two performance tools results and shows how could the tools be used for analyzing hybrid applications including examination of the MPI communication time and detailed trace and event information for all threads/processes.