In Partial Fulfillment of the Requirements for the Degree of
Master of Science
Will defend his thesis
Polygon models are essential in spatial data mining applications in order to analyze relationships and change between clusters. Moreover they can be utilized for representing spatial clusters and for visualizing spatial clusters. However, there is not much research concerning the usage of polygons as cluster models in data mining applications. There are algorithms which can generate polygon models for a set of points. However, most of the proposed algorithms were developed for computer graphics, computer vision and pattern recognition applications which use different application specific criteria. Many of these algorithms do not meet the requirements of data mining applications such as removing outliers, dealing with separate regions in clusters with varying densities and working with different shapes of clusters. To alleviate these problems, this thesis investigates different approaches for generating polygon models for spatial clusters and proposes the best practices.
Existing algorithms for polygon generation were tested and evaluated with different datasets, and their applicability for data mining was investigated; as a result of this analysis preprocessing and post processing techniques were proposed to enhance the quality of the generated polygon models. A software framework was developed that takes a cluster as an input and generates polygon model for the cluster as an output. The framework uses preprocessing to remove outliers, detect sub regions and post processing to create visually appealing, simple, smooth polygon for each region in the cluster. Moreover, a novel polygon fitness function is proposed which is used to maximize the smoothness of the generated polygon.