In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
Optimizing the Performance of Directive-based Programming Model for GPGPUs
Accelerators have been considered to improve the performance by many scientific and technical programmers to program and accelerate huge scientific applications. Accel- erators such as GPUs have immense potential in terms of high compute capacity but programming these devices is a challenge. OpenCL, CUDA and other vendor-specific models defi nitely offer high performance, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and de- bug. In order to simplify GPU programming, several directive-based programming models have been proposed such as HMPP, PGI accelerator model and OpenACC which is further established as the standard. We evaluate and compare these models involving several scientific applications. Since OpenACC development is still in its early stages, most of the existing OpenACC compilers such as PGI and Cray are developed by industrial companies. To study the implementation challenges and the principles and techniques of directive-based models, we built an open source Ope- nACC compiler on top of a main stream compiler framework (OpenUH as a branch of Open64). In this dissertation, we present the required techniques to parallelize and optimize the applications ported with OpenACC programming model. We apply both user-level optimizations in the applications and compiler and runtime driven optimizations. The compiler optimization focuses on the parallelization of reduc- tion operations inside nested parallel loops. To fully utilize all GPU resources, we also extend the OpenACC model to support multiple GPUs in a single node. Our application porting experience also reveals the challenge of choosing good loop sched- ules. The default loop schedule chosen by the compiler may not produce the best performance, so the user has to try different loop schedule options to improve the performance. To solve this issue, we develop a locality-aware auto-tuning framework which is based on the proposed memory access cost model to help the compiler choose optimal loop schedules and guide the user to choose appropriate loop schedules.
Date: Wednesday, April 20, 2016
Time: 10:00 AM
Place: PGH 218
Advisor: Prof. Barbara Chapman
Faculty, students, and the general public are invited.