Calendar - University of Houston
Skip to main content

[Defense] Matrix Computations on TensorCore GPU

Wednesday, April 20, 2022

4:00 pm - 7:00 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Shaoshuai Zhang
will defend his dissertation
Matrix Computations on TensorCore GPU


Abstract

The emergence of neural engines such as Nvidia TensorCore GPU brings a revolution to deep neural networks, as the neural engines can perform extremely fast general matrix multiplications. However, how to deploy other algorithms or applications on neural engines remains questionable. In this dissertation, I try to explore the possibilities of using TensorCore GPU to accelerate BLAS3 operations, linear algebra algorithms and machine learning algorithms on GPUs, hybrid CPU-GPU architecture and distributed system. Specifically, I design TensorCore-based matrix computation algorithms that can work on different architectures. On single GPU, my work include implementing some of the basic linear algebra operations, and these operations can be used in further matrix factorization. In terms of matrix factorization, I devote into developing the recursive QR factorization which utilizes the TensorCore GPU efficiently. I also try to use TensorCore to accelerate the 2-stage Eigen Value Decompostion. In addition, I also try to migrate the scalable CPU-based support vector machine tool to TensorCore, which exhibits a significant speedup and shows better performance performance compared to the state-of-art GPU-based SVD software. On the CPU-GPU hybrid architecture, I go a step further on investigating the recursive strategy, then do a case study of out-of-core QR factorization using the recursive strategy, and the results prove that the recursive algorithm works much better than the conventional algorithm. On the distributed memory system, my current work is developing an unique data structure named Universal Distributed Array (UDA) which has abundant programming flexibility and it can utilize TensorCore as well. Generally speaking, the TensorCore-based algorithms are typically have a very high performance, but it has to face the accuracy loss problem because of using half precision.


Wednesday, April 20, 2022
4PM - 7PM CT
Online via  MS Teams

Dr. Panruo Wu, dissertation advisor

Faculty, students and the general public are invited.

Doctoral Dissertation Defense