In Partial Fulfillment of the Requirements for the Degree of
Doctor of Philosophy
Will defend his dissertation
Geo-referenced datasets are generated at increasing rates which creates the need to develop tools that extract knowledge automatically. Traditional data mining techniques mostly focus on finding global patterns and lack the ability to systematically discover regional patterns. Finding interesting regional patterns is important because many patterns only exist at regional level but not a global level. This dissertation focuses on developing methods to uncover hidden correlation patterns and developing regional regression tools. First, we introduce novel, PCA-based approach to discover interesting regions along with regional correlation patterns that exhibit strong relationships in the attribute space. Also a generic similarity measure to assess the structural similarity between regions is proposed. Second, we propose a regional regression framework, called REG^2, that discovers regional regression functions that capture the spatially varying relationships between dependent variable and independent variables. Third, in order to REG^2 and other geo-regression methods we propose various prediction evaluation metrics. We developed several plug-in fitness functions that employ PCA, AIC, regularization, and example weighting to improve capability of uncovering the underlying structure of data without assuming predetermined boundaries, such as zip codes or grids. The proposed frameworks are evaluated in case studies that center on indentifying causes of arsenic contamination in Texas water wells and determining spatially varying effects of house properties on house prices.