On the Naturalness of Software, and How to Exploit It
When: Monday, April 27, 2020
Where: PGH 232
Time: 11:00 AM
Speaker: Dr. Prem Devanbu, University of California-Davis
Host: Dr. Amin Alipour
While natural languages are rich in vocabulary and grammatical flexibility, most human <utterances> are mundane and repetitive. This repetitiveness in natural language has led to great advances in statistical NLP methods.
At UC Davis, we discovered back in 2012 that, despite the considerable power and flexibility of programming languages, large software corpora are actually even more repetitive than NL Corpora. We were the first to show that this “naturalness” of code could be captured in statistical models, and exploited within software tools. The field has since blossomed, with numerous applications: to de-obfuscation, code synthesis, defect-finding, etc. New groups have formed at Facebook, Microsoft, and Google, and also several startups. In this talk, we will introduce our earlier findings, and some recent results exploring the science of why code-in-the-wild is so repetitive, and also some new ways of training deep-learning models to correct student code.
Prem Devanbu received his B.Tech from IIT Madras, and his Ph.D. in Computer Science from Rutgers University under Alex Borgida. After working at Bell Labs and its various offshoots in New Jersey for many years, he joined the faculty of UC Davis in 1997. He is an ACM Fellow.