Calendar - University of Houston
Skip to main content

[Defense] Matrix Computations on TensorCore

Friday, December 3, 2021

8:00 am - 10:00 am

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Shaoshuai Zhang
will defend his proposal
Matrix Computations on TensorCore


Abstract

The emergence of neural engines such as Nvidia TensorCore GPU brings a revolution to deep neural networks, as the neural engines can perform extremely fast general matrix multiplications. However, how to deploy other algorithms or applications on neural engines remains questionable. In this dissertation, I try to explore the possibilities of using neural engines to accelerate BLAS3 operations, linear algebra algorithms and machine learning algorithms on GPUs, hybrid CPU-GPU architecture, and distributed systems. Specifically, I design TensorCore-based matrix computation algorithms that can work on different architectures. On a single GPU, my work includes implementing the basic linear algebra operations, and these operations can be used in further matrix factorizations. In terms of matrix factorizations, I devote myself to developing the recursive QR factorization which utilizes the power of TensorCore efficiently. I would also like to try other matrix factorization algorithms, for instance, factorization and eigenvalue decomposition. In addition, I also try to migrate the scalable CPU-based support vector machine tool to TensorCore, which exhibits a significant speedup and shows better performance compared to other GPU-based SVD software. On the CPU-GPU hybrid architecture, I go a step further on investigating the recursive strategy, then do a case study of out-of-core QR factorization using the recursive strategy, and the results prove that the recursive algorithm works much better than the conventional algorithm. On the distributed memory system, my current work is developing a unique data structure named universal distributed array which has abundant programming flexibility and it can utilize TensorCore as well. Generally speaking, the TensorCore-based algorithms typically have a very high performance, but it has to face the accuracy loss problem because of using half precision.


Friday, December 3, 2021
8:00 AM - 10:00 AM CT
Online via  MSFT Teams

Dr. Panruo Wu, dissertation advisor

Faculty, students and the general public are invited.