In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend her dissertation proposal
Advances in sensor networks and satellite systems have created large amounts of spatial data. Therefore, tools and techniques to automatically extracting meaningful information from this kind of data are crucial. In this research, a generic framework to aid analyzing relationships between clusters is developed, initially centering on spatial clusters; a computerized tool will be developed to compare different clusters and clusterings based on different characteristics and to summarize their relationships in form of predicates and measurements. One goal of the dissertation is to obtain a deeper understanding what relationships between clusters are important and how to describe them, and to provide computerized tools that analyze, and reason with such relationships automatically. The second goal of the dissertation is to make unique scientific contributions to change analysis and to dual and multi-run clustering.
Change analysis is important for tackling major real world problems, such as analyzing climate changes, understanding patterns of disease outbreak, and analyzing crime hotspots. This dissertation focuses on analyzing changes in spatial data relying on a cluster analysis approach. Methodologies and tools will be developed that automatically detect important changes between two data sets, such as concept drift and the occurrence of novel concepts.
Moreover, dual clustering frameworks will be explored that cluster two datasets in parallel, maximizing desirable relationships between the two clusterings. Dual clustering aims to obtain clusters that relate to each other and the clustering of one dataset depends on the clustering of the other dataset.
Analyzing cluster relationships is also important for multi-run clustering. The goal of multi-run clustering is to obtain better clusterings by assembling good clusters that originate from multiple runs. In particular, cluster evaluation and novelty measures that are suitable for multi-run clustering will be investigated, and algorithms will be developed that create a final clustering from a given set of clusters