Dissertation Proposal - University of Houston
Skip to main content

Dissertation Proposal

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy

Prasha Shrestha

will defend her dissertation proposal

Cross Domain and Open Set Authorship Attribution


Abstract

Most authorship attribution research focuses on the single genre scenario where the known texts from prospective authors are of the same topic and genre as the text for which we are trying to perform attribution. A more practical scenario is the cross-genre problem where the texts from prospective authors are from one topic or genre and the text for which we are trying to generate a prediction is of a completely different genre. Limiting ourselves to only take on attribution problems for which we already possess texts of the same genre will leave many attribution problems unsolved. The cross-genre attribution problem is especially hard because genre and topic changes will cause texts written by a single author to be entirely different. The task is then to distill out the topic and genre specific attributes of the text such that what remains is purely owing to an author's style.

In the same vein as the in-domain and cross-domain problems, most authorship attribution tasks also focus on the closed-set problem, where the set of possible candidate authors are known a priori. In a realistic scenario, it is very unlikely to have this information beforehand. Such a closed-set system will end up wrongly attributing a text to one of these authors when a document does not belong to any of them. An open-set solution will be necessary in such cases. In an open-set solution, the author of a given document could be any one of the possible candidates or it could also be someone entirely different. Being able to rule out all of the authors under scrutiny is valuable information that this method can provide that a closed-set solution cannot.

Our proposal is to work on a realistic scenario where both open-set and cross-genre problems exist, in order to design approaches that will work even when a document is written by an out-of-set author, while leveraging any author texts available to us.


Date: Friday, May 5, 2017
Time: 11:00 AM
Place: PGH 501D
Advisors: Dr. Thamar Solorio

Faculty, students, and the general public are invited.