In Partial Fulfillment of the Requirements for the Degree of Master of Science
will defend his thesis
Detecting Phish Using Website Content and URL N-Gram Features
Phishing websites are websites that attempt to steal login credentials or other confidential information from Internet users. They are ubiquitous, and their impact and prevalence only increases with time.
In order to counteract this threat, many approaches have been attempted. This work attacks the problem of detecting phishing websites in several key ways. First, this thesis continues the work of Preventing Digital Identity Theft Using Fundamental Characteristics, by Tanmay Thakur, by converting many of its proposed heuristics and filtering methods into features for a machine learner, and by improving upon its website collection method by reducing the inherent bias in its legitimate URL set. Second, this thesis adds the occurrence of URL n-grams as features, and it adds other features derived from those URL n-grams in order to better take advantage of certain patterns in phishing and legitimate URLs. Finally, this thesis adds website similarity features that attempt to take advantage of the fact that phishing websites are often copies of other websites. Since this set of similarity features gives a degree to which they are similar, it provides the added benefit of being an indirect proxy feature for the contents of the webpage. The methods mentioned above are used to obtain three particularly interesting classification results. One provides a 99% accuracy with an F1-Score of 98.7%. Another provides a 98.5% accuracy, but it is trained in 3.5 minutes. Finally, another one managed to classify every legitimate website as a legitimate website, which indicates that every website it classifies as a phish must be one.
This method uses content features and features from the URLs of webpages to determine whether or not they belong to a phishing website.
Date: Monday, November 21, 2016
Time: 11:00 AM
Place: PGH 501D
Advisor: Dr. Rakesh Verma
Faculty, students, and the general public are invited.