In Partial Fulfillment of the Requirements for the Degree of Master of Science
will defend his thesis
Attribute Selection and Machine Learning Algorithms for Intrusion Detection
As technology progresses, more critical information is being stored on computers. This allows the data to be stored more compactly, and sometimes adds benefits such as searchability. At the same time, these storage spaces are often connected through a network to the outside world. This increases the chance of attack, as clever thieves can now access the data remotely. Authentication methods can be used to try and block attackers, but hackers can get around them by gaining access to user credentials. Then they can steal data under the guise of a normal user. To combat this, I look at how attacker activity differs from normal user activity. I used two categories from the Windows-Users and -Intruder simulations Logs dataset, one of time and the other of file pathway. From these categories I calculated ten attributes. For each attribute I described the intuitive expectation of how attack and normal values will differ. Then I used various methods to analyze the attributes. First, I looked at how a simple threshold on each attribute would perform individually. I also looked at the linear correlation between the attributes. Then I used a white-box machine learning method. One time I trained it with all the attributes, and the second time I trained it with only the attributes that performed well with a simple threshold. Next, I used a black-box machine learning method. This method was trained once with all attributes, once with the best individual attributes, and once with the attributes selected by the white-box method. I found that only a couple of the original attributes did well on their own. However, most of the attributes selected by the white-box method were poor individual performers. The black-box method did best with the attributes selected by the white-box method, and worst with the best individual performers. The white-box method strongly outperformed the black-box method on attack data, but did slightly worse at correctly identifying normal data.
Date: Thursday, April 26, 2018
Time: 3:00 PM
Place: PGH 550
Advisor: Dr. Stephen Huang
Faculty, students, and the general public are invited.