[Defense] Methodology for Evaluating and Interpreting Neural Code Intelligence Models
Friday, April 22, 2022
9:00 am - 11:00 am
Md Rafiqul Islam Rabin
will defend his proposal
Methodology for Evaluating and Interpreting Neural Code Intelligence Models
Deep neural networks are increasingly being used in various code intelligence tasks such as code summarization, vulnerability detection, type annotation, and many more. While the performance of neural models for intelligent code analysis continues to improve, our understanding of how reliable these models are on unseen data, what are the impacts of noise in training those models, and what relevant features they learn from input programs are largely unknown. To reliably use such models, researchers often need to reason about the behavior of the underlying models and the factors that affect them. However, this becomes very challenging as these models are opaque black-boxes and usually rely on noise-prone data sources (i.e., GitHub) for learning. The state-of-the-art approaches are also often specific to a particular set of architectures and require access to the model’s parameters, which hinders the reliable adoption for the average programmer. To this end, we propose simple model-agnostic approaches to evaluate the generalization performance of neural code intelligence models and interpret their predictions. The overarching goal of this research is to better understand the model inference in terms of generalizability, memorization, and interpretability.
Firstly, we evaluate the generalizability of models on unseen data with respect to semantic-preserving program transformations. Secondly, we investigate the extent of memorization in models by inducing random noise to the original training dataset and use several metrics to quantify the impact of noise on various aspects of training and testing. Thirdly, we identify critical input features to interpret the models’ predictions through prediction-preserving program simplifications. Our results suggest that neural code intelligence models are often vulnerable to very small semantic changes, usually rely on few tokens for making predictions, and can memorize noisy data with excessive parameters, thus suffering in generalization performance.
9:00AM - 11:00AM CT
Virtual via MS Teams
Dr. M. Amin Alipour, dissertation advisor
Faculty, students and the general public are invited.