Calendar - University of Houston
Skip to main content

[Defense] Preprocessing for Code-Switching Models

Monday, November 22, 2021

3:30 pm - 5:00 pm

In Partial Fulfillment of the Requirements for the Bachelor of Science
Dwija Parikh
will defend her senior honors thesis
Preprocessing for Code-Switching Models


Abstract

Code-switching is an omnipresent phenomenon in multilingual communities all around the world but remains a challenge for Natural Language Processing (NLP) systems due to the lack of proper data and processing techniques. Hindi-English code-switched text on social media is often transliterated to the Latin script which prevents from utilizing monolingual resources available in the native Devanagari script.

This thesis proposes a method to normalize and back-transliterate code-switched Hindi-English text. In addition, we present a grapheme-to-phoneme (G2P) conversion technique for romanized Hindi data. As part of this project, we also release a dataset of script-corrected Hindi-English code-switched sentences labeled for the named entity recognition and part-of-speech tagging tasks to facilitate further research in this area. The techniques presented in this thesis aim to benefit downstream NLP applications including Named Entity Recognition, speech processing systems, conversational systems, and many more.


Monday, November 22, 2021
3:30PM - 5:00PM CT
Online via Zoom

Dr. Thamar Solorio, thesis advisor

Faculty, students and the general public are invited.