Calendar - University of Houston
Skip to main content

[Defense] Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content

Monday, November 16, 2020

11:00 am - 12:30 pm

In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
Avisha Das
will defend her dissertation
Proactive Defense through Automated Attack Generation: A Multi-pronged Study of Generated Deceptive Content


Social engineering attacks like phishing, email masquerading, etc. where a perpetrator impersonates as a legitimate entity, are a threat to cybersecurity researchers. However, despite having a higher probability of success, executing such an attack can be costly in terms of time and manual labor. With the advancements in machine learning and natural language processing techniques, attackers can now use more sophisticated methods to evade detection. Deep neural learners are capable of natural text generation when trained on huge amounts of written textual content. While these techniques have been tested in creative content (stories) generation based tasks, they have been abused to generate fake content (fake news) as well. In a proactive scenario, we presume that defenders would resort to automated methods of attack vector generation. However, the application of neural text generation methods to email generation is fairly challenging owing to the presence of noise or sparsity in emails and the diversity in email writing style. Moreover, the evaluation and detection of generated content is a challenging and cumbersome task and current automated metrics do not provide the best possible alternative.

We analyze the task of automated content generation for two tasks: (a) creative content or story generation from writing prompts, and (b) generation of emails from given subject prompts for specific intents. We split the proposed analysis for each task into three defined parts – (i) content (story/email) generation; (ii) fine-tuning and improving upon generated content; and (iii) content evaluation. The current baselines like word-based pre-transformer Recurrent Neural Networks and pre-trained and fine-tuned transformer language models suffer from issues like low coherence and relatedness. Therefore, we design a hierarchical generative architecture that handles coherency at the sentence-level during generation – using a sentence-embedding based selector, it selects the best sentence candidates from a generative language model in an iterative fashion. Finally, we compare the linguistic quality of the generated text to human authored text using a set of automated metrics. We also correlate with a human-based user study – to ascertain how well the metrics can distinguish between writing patterns. Moreover, we explore if there exists a difference in system performance with respect to the genre of text generation – story vs. emails.

 Monday, November 16, 2020
11:00AM - 12:15PM CT
Online via Zoom (ID 307 815 8361)

Dr. Rakesh Verma, dissertation advisor

Faculty, students and the general public are invited.

Online vis Zoom