In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation proposal
Optimizing the Performance of Directive-based Programming Model for GPGPUs
Accelerators have been considered a viable way by many scientific and technical programmers to program and accelerate hug scientific applications. Accelerators such as GPUs have immense potential in terms of high compute capacity but programming these devices is a challenge. CUDA, OpenCL and other vendor-specific models are definitely a way to go, but these are low-level models that demand excellent programming skills; moreover, they are time consuming to write and debug. In order to simplify GPU programming, several directive-based programming models have been proposed such as HMPP, PGI accelerator model and OpenACC which is further established as the standard among these models. Since OpenACC development is still in its early stages, most of the existing implementations are in commercial compilers. The challenge with these commercial compilers is that it is not straightforward to deal with the compile-time and runtime errors and explain varying performance numbers between different compilers and even between different versions of the same compiler. So we create an open source OpenACC compiler in a main stream compiler framework (OpenUH as a branch of Open64). In this dissertation, we present the required techniques to parallelize and optimize the applications ported with OpenACC programming model. We apply both manual optimizations in the applications and automatic optimizations in compiler and runtime. The automatic optimizations focus on the runtime library design and implementation, the parallelization of reduction operations inside nested parallel loops, and the OpenACC model extension from single GPU to multi-GPU. Another research issue we try to solve is the auto-tuning for loop scheduling. This is because the default loop schedule chosen by the compiler has large performance gap compared to the manually tuned loop schedule. To solve this issue, we develop a locality-aware auto-tuning framework that is based on memory access cost model to help the compiler and runtime to decide the optimal loop schedule.
Date: Tuesday, May 5, 2015
Time: 12:30 PM
Place: PGH 218
Advisor: Prof. Barbara Chapman
Faculty, students, and the general public are invited.