When: Monday, September 30, 2019
Where: PGH 563
Time: 11:00 AM
Incremental Machine Learning Models Using a Summarization Matrix for Large Datasets
Sikder Tahsin Al Amin, Ph.D. Student
Big data analytics generally rely on parallel processing in large computer clusters. However, this approach is not always the best. CPUs speed and RAM capacity keep growing, making small computers faster and more attractive to the analyst. Machine Learning (ML) models are generally computed on a data set, aggregating, transforming and filtering big data, which is orders of magnitude smaller than raw data. Users prefer “easy” high-level languages like R and Python, but they present memory and speed limitations.
Finally, data summarization has been a fundamental technique in data mining that has a great promise with big data. With that motivation in mind, we adapt the Γ (Gamma) summarization matrix to work in the R language. Γ is significantly smaller than the data set and works well for a remarkably wide spectrum of ML models, including supervised and unsupervised models. We focus on the incremental computation of four fundamental and complementary machine learning models: Linear Regression, Principal Component Analysis, Naive-Bayes and K-means. An extensive experimental evaluation proves our incremental models on summarized data sets are accurate and their computation is significantly faster than R built-in functions and Spark.
Sikder Tahsin Al Amin is a 3rd-year PhD student in Computer Science at the University of Houston. He is advised by Dr. Carlos Ordonez and his research interest lies within Big Data Analytics with Graph Theory and Machine Learning. He received his Bachelor's degree in Computer Science and Engineering from Khulna University of Engineering and Technology (KUET), Bangladesh.
You Are Not Alone: Helping Users Not to Fall for Phish
Shahryar Baki, Ph.D. Student
Email-based attacks are a rich field with well-publicized consequences. Despite a decade of effort on stopping these attacks, they still cost millions of dollars to companies and Internet users.
Spam filter and end-users inevitably fail to detect such attacks at some point. Our goal is to combine detection techniques and user training programs, two fields which are independent in the current state of phishing detection/mitigation.
We show how the current Natural Language Generation (NLG) technology allows defenders to diversify their dataset, and use it to improve their models. We apply such techniques to build an automated model to warn users about suspicious content in emails. Therefore, users and detection models help overcome each others' shortcomings.
Shahryar Baki is a 5th-year Ph.D. student of Computer Science at the University of Houston. He is working with Professor Rakesh Verma for his research. Shahryar's main research focuses on utilizing Natural Language Processing tools to improve cyber-security techniques (phishing email/website specifically).
Enhancing Subject Matter Assessments Utilizing Augmented Reality and Serious Game Techniques
Brian Holtkamp, Ph.D. student
In this work, we utilize the Microsoft HoloLens, a wearable augmented reality (AR) device, to investigate how well an AR-based assessment tool measures a student’s comprehension of, skill in, and aptitude for a given subject matter. We added assessment capabilities to a serious game prototype built in collaboration with Construction Management faculty for their Occupational Safety and Health Administration (OSHA) safety course. The trial consisted of a traditional pen-and-paper exam and an AR-based assessment. The AR-based assessment required the students to identify unsafe situations of virtually simulated workers, construction equipment, and/or vehicles in an AR diorama of an active construction site.
Brian Holtkamp received his Bachelor’s degree in Computer Science from the University of Houston-Downtown in 2013. He is currently a 6th year Ph.D. in Computer Science working under Dr. Chang Yun and Dr. Jaspal Subhlok in the VAST (Visualization, Automation, Simulation, and Training) Lab. His research interest are serious games, human-computer interaction, and utilizing computer science to enhance education.