In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend his dissertation
On-Line Analytical Processing, OLAP, is a family of database algorithms and techniques used to compute multiple aggregation queries on an input fact table. Because it is generally accepted by OLAP queries are both slow and hard to optimize when evaluated within a DBMS, a significant portion of cube computations is often computed outside the database system. In this dissertation, we contradict such traditional practice by providing three technical contributions, at three different levels, that involves a new algorithm and multiple optimizations. At the lowest computation level, we improve OLAP data cube precomputation with innovative memory-only data structures that can be seamlessly integrated within a user-defined function, an extensibility mechanism provided by SQL. Second, we extend OLAP exploratory operations by introducing horizontal aggregations, a novel operation in SQL that combines pivoting and multidimensional aggregation of the data set into a single query. Such operation is essential as a pre-processing step for data mining algorithms, which do not generally accept a data set with a vertical layout. We show that horizontal aggregations are faster and more flexible than a built-in pivot operator. Finally, with OLAP as a data mining tool, we show it can synergistically work with statistics. By combining OLAP cubes with a parametric statistical test, we developed Cube Statistical Tests, a novel analytic algorithm that allows for the isolating of predictive attributes. This is accomplished with similarity comparisons between two cube subgroups based on a probabilistic function. In addition, we carefully study the validity of discovered patterns by performing a reliability analysis versus Association Rules, a standard pattern discovery algorithm. We show that Cube Statistical test is superior both in quality of discovered patterns and speed.