Department of Computer Science at UH

University of Houston

Department of Computer Science

In Partial Fulfillment of the Requirements for the Degree of
Master of Science

Bangsheng Sui

Will defend his thesis

Information Gain Feature Selection Based on Feature Interactions

Abstract

Analyzing high dimensional data stands as a great challenge in machine learning. In order to deal with the curse of dimensionality, many effective and efficient feature selection algorithms have been developed recently. However, most feature selection algorithms assume independence of features; they identify relevant features mainly on their individual high correlation with the target concept. These algorithms can have good performance when the assumption of feature independence is true. But they may perform poorly in domains where there exist feature interactions. Due to the existence of feature interactions, a single feature with little correlation with the target concept can be in fact highly correlated when looked together with other features. Removal of these features can harm the performance of the classification model severely.

In this thesis, we first present a general view of feature interaction. We formally define feature interaction in terms of information theory. We propose a practical algorithm to identify feature interactions and perform feature selection based on the identified feature interactions. After that, we compare the performance of our algorithm with some well-known feature selection algorithms that assume feature independence. By comparison, we show that by taking feature interactions into account, our feature selection algorithm can achieve better performance in datasets where interactions abound.

Date: Monday, November 11, 2013
Time: 10:00 AM
Place: PGH 550

Faculty, students, and the general public are invited.
Advisor: Prof. Ricardo Vilalta