In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Wellington Cabrera Arevalo
will defend his dissertation proposal
Optimized Algorithms for Data Analysis in Parallel Databases
Database Management Systems (DBMSs) are frequently used as a repository for large data sets. Since DBMSs are designed to solve relational queries, the common perception is that DBMSs are not suitable for complex data analysis. On the other hand, data analysis inside the DBMS brings benefits as saving tedious exporting data work, and improved data integrity and security. In this work, we present an approach that dramatically increases the speed of data analysis for large data sets in DBMSs, considering row, column and array based DBMSs. We compute a summarization matrix directly from data stored in the DBMS. This computation is performed in parallel, both in clusters and in multicore architectures. This summarization matrix becomes a common foundation to optimize several analytical algorithms. We focus on Principal Component Analysis, Variable Selection and Linear Regression. We show experimentally that our approach makes possible to obtain results that are computed orders of magnitude faster than standard approaches.
Date: Friday, December 4, 2015
Time: 12:00 PM
Place: PGH 550
Advisor: Dr. Carlos Ordonez
Faculty, students, and the general public are invited.