In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
New Approaches to Hierarchical Modeling — Frameworks, Algorithms, and Applications
Obtaining hierarchical organizations of knowledge is important in many domains. To create such hierarchies improved techniques for subdividing entities hierarchically according to similarities and differences are needed. New techniques for organizing docum ents in hierarchies, for automatic document retrieval and for hierarchical query clustering are being made available at a fast pace. In this work, we investigate new methods to induce hierarchical models with the goal of obtaining better predictive models, to facilitate creating background knowledge with respect to an underlining class distribution, to obtain hierarchical groupings of a set of objects based on background knowledge they share, to detect sub-classes within existing class distribution, and to provide methods to evaluate hierarchical groupings. The results of this effort has led to the development of (1) TPRTI, a new regression tree induction approach which uses turning points — candidates split points computed before the recursive process takes place — to recursively split the node datasets; (2) PATHFINDER, a new classification tree induction capable of inducing very short trees with high accuracies for the price of not classifying examples deemed difficult to classify; (3) Avalanche, a new hierarchical divisive clustering approach which takes as input a distance matrix and forms clusters maximizing inter-cluster distances; (4) a new agglomerative clustering approach called STAXAC which creates supervised taxonomies that unlike traditional agglomerative clustering — which only uses proximity as the single criterion for merging—uses both proximity and class labels information to obtain hierarchical groupings of a set of objects. We applied the techniques, we developed, (1) to molecular to phylogenetic based taxonomy generation and found that this new approach an d the obtained supervised taxonomies can help biologists to better characterize organisms according to some characteristics of interest such as diseases, growth rate, etc.; (2) to data editing; we were able to enhance the accuracy of a k-nearest neighbor classifier by removing minority class examples from clusters that were extracted from a supervised taxonomy; (3) to meta learning; we developed new algorithms that operate on supervised taxonomies and compute both the distribution of the classes within a dataset, and the difficulty of classifying examples belonging to a particular dataset.
Date: Tuesday, November 24, 2015
Time: 11:15 AM
Place: PGH 550
Advisor: Dr. Christoph F. Eick
Faculty, students, and the general public are invited.