Dissertation Proposal - University of Houston
Skip to main content

Dissertation Proposal

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Wellington Cabrera Arevalo

will defend his dissertation proposal

Optimized Algorithms for Data Analysis in Parallel Databases


Abstract

Database Management Systems (DBMSs) are frequently used as a repository for large data sets.  Since DBMSs are designed to solve relational queries, the common  perception is that DBMSs are not suitable for complex data analysis.  On the other hand, data analysis inside the DBMS brings benefits as saving tedious exporting data work, and improved data integrity and security. In this work, we present an approach that dramatically increases the speed of data analysis for large data sets in DBMSs, considering row, column and array based DBMSs. We compute a summarization matrix directly from data stored in the DBMS. This computation is performed in parallel, both in clusters and in multicore architectures. This summarization matrix becomes a common foundation to optimize several analytical algorithms. We focus on Principal Component Analysis, Variable Selection and Linear Regression. We show experimentally that our approach makes possible to obtain results that are computed orders of magnitude faster than standard approaches.


Date: Friday, December 4, 2015
Time: 12:00 PM
Place: PGH 550
Advisor: Dr. Carlos Ordonez

Faculty, students, and the general public are invited.