Calendar - University of Houston
Skip to main content

[Defense] Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification

Wednesday, February 2, 2022

2:00 pm - 4:00 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Hadi Mansourifar
will defend his dissertation
Adversarial and Non-Adversarial Approaches for Imbalanced Data Classification


Abstract

In this research, we proposed novel types of adversarial and non-adversarial methods to deal with imbalanced data classification. Researchers deal with the class imbalanced problem in many real-world applications from cybersecurity to health. While non-adversarial approaches like SMOTE are still popular in certain domains, adversarial approaches are a trend since the rise of deep neural networks. The data driven approaches to tackle the imbalanced data classification suffer two major problems: (i) lack of diversity (ii) uncertainty. In this research, we propose a set of novel approaches to address the mentioned problems.

First, we proposed Cross-Concatenation, the first projection-based method to address imbalanced data classification problem. Cross-Concatenation is the first projection method which can increase the size of both minority and majority classes. We proved that, Cross-Concatenation can create larger margins with better class separation. Despite SMOTE and its variations, Cross-Concatenation is not based on random procedures. Thus, in case of running it on fixed training and test data the same efficiency results are obtained. This stability is one of the most important advantages of Cross-Concatenation versus SMOTE. Besides, our experimental results show the competitive Cross-Concatenation results versus SMOTE and its variants as the most popular over-sampling approaches in terms of F1 score and AUC in majority of test cases.

Second, we introduced a new concept called virtual big data. Virtual big data is high dimensional version of original training data which is generated by concatenation of c different original instances. This technique can increase the number of training data from N to C(N,c). We proved that, the curse of dimensionality of V-GANs can alleviate the vanishing generator gradient problem. Moreover, by concatenating c different instances belonging to c different modes the risk of mode collapse decreases significantly using the diversity maximization function. Our experimental results showed that, the augmented data by V-GAN can improve the imbalanced data classification results.

Third, we proposed a novel type of Self-Supervised GAN called Self-Competitional GAN (SCOM-GAN) to increase the ability of conventional DCGANs to generate high quality results. We discussed that, It’s very easy to upgrade a DCGAN to its SCOM-GAN version by adding an auxiliary classifier and dynamic pseudo-labeling. SCOM version can not only reach better results but it can decrease the required iterations to reach the minimum FID score. Furthermore, SCOM-GANs decrease required time to train a Self-Supervised GAN. Fourth, we proposed a novel measure called RFVL to make the GANs evaluation explainable. We showed that, RVL is a stable measure to evaluate the impact of hyperparameter changes comparing to FID score and GAN training loss. We also proved that, RFVL can successfully disclose the early signs of collapsing diversity and over-fitting. Based on RFVL we defined CI and OFI which are significantly helpful to reveal the cons and pros of each GAN variation


Wednesday, February 2, 2022
2:00PM - 4:00PM CT
Online via  Zoom

Dr. Weidong (Larry) Shi, dissertation advisor

Faculty, students and the general public are invited.

Doctoral Dissertation Defense