Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Naveen Mohanam

Will defend his thesis


INTEGRATION OF LINEAR ALGEBRA NUMERICAL METHODS WITH A DBMS

Abstract

Linear algebra computations in general and various matrix operations in particular, are fundamental parts of a broad variety of engineering and computational science problems. We focus on computing Principal Component Analysis (PCA), a mathematical model to perform dimensionality reduction. Computing PCA involves solving Singular Value Decomposition (SVD) on the data set correlation matrix. Programming these numerical methods inside a DBMS is much difficult as its architecture is not suited for intense numerical computations. To overcome the inherent limitations of DBMS, We explore an alternative reusability approach by calling the standard optimized numerical library LAPACK (Linear Algebra PACKage). We study several alternatives available to summarize the data set, propose improvements to those existing methods, and demonstrate how to efficiently call SVD routine available in library using SQL mechanisms such as User-Defined Function (UDF). We pay a particular attention to scalability issues; to that end, we push heavy processing into the main memory and exploit multiple cores in the CPU. We benchmark these alternatives on a modern DBMS with large data sets. We prove it is feasible to solve SVD by first summarizing the data set efficiently using aggregate UDF, then pushing intense numerical computation involved in SVD to RAM by calling the library through UDF. Furthermore we show that, by using a multi-threaded version of LAPACK, we can leverage parallel processing on multi-core CPUs. Our solution requires only one table scan on the data set, displays linear scalability, is fully parallel, runs at peak CPU speed and works on any DBMS supporting aggregate UDFs and an interface to import library. In short, our approach enables fast computation of linear algebra methods inside a DBMS using purely SQL mechanisms.

 

Date: Wednesday, July 25, 2012
Time: 2:00 PM
Place: 550-PGH

Faculty, students, and the general public are invited.
Advisor: Dr. Carlos Ordonez