In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Kinjal Dhar Gupta
will defend his dissertation
Robust Domain Adaptation Using Active Learning
Traditional machine learning algorithms assume training and test datasets are generated from the same underlying distribution, which is not true for most real-world datasets. As a result, a model trained on the training dataset fails to produce good classification accuracy on the test dataset. One way to mitigate this problem is use domain adaptation techniques; these techniques build a new model on the unlabeled test dataset (target dataset) by transferring information from a related but labeled training dataset, (source dataset) even when their underlying distributions are different. One other important issue is that in domain adaptation, there is no allowance for obtaining class labels of the test dataset during the training phase. This issue can be handled by active learning techniques that assume the existence of a budget that can be used to label instances on the target domain. Active learning finds the most informative instances of the test dataset that can be labeled by the expert to get a better classification accuracy on the unlabeled test dataset. Domain adaptation also assumes that the class conditional distributions across the two domains are the same, which may not be true in many cases.
The goal of this research is to build an optimal classifier on the target dataset by using information related to model complexity. We propose a novel domai n adaptation technique using active learning to find the optimal value of a parameter of a class of models that yields the best classifier on the target dataset without assuming the equivalence of the class-conditional probabilities across the domains. We combine the prior distribution of the parameter, obtained from the source, with its likelihood obtained from a sample of most informative labeled instances of the target dataset. This research also proposes a novel data alignment technique that allows the use of the source model directly on the target if the distributions differ due to a linear shift, thus avoiding building a complete new classifier on the target domain.
Date: Friday, July 15, 2016
Time: 11:00 AM
Place: HBS 350
Advisor: Dr. Ricardo Vilalta
Faculty, students, and the general public are invited.