Computer Science Seminar - University of Houston
Skip to main content

Computer Science Seminar

Large Scale Bayesian Machine Learning with the SimSQL System

When: Friday, November 8, 2013
Where: PGH 232
Time: 11:00 AM

Speaker: Prof. Chris Jermaine, Rice University

Host: Dr. Carlos Ordonez

SimSQL is a parallel database system developed at Rice that supports a version of a SQL, with a few key extensions. For example, SimSQL allows users to utilize (and also define) sampling distributions that can be used to stochastically generate database tables. These stochastic tables can be defined recursively, so that an older version of a stochastic table can be used to parameterize a newer version. Taken together, these extensions make it easy to use SimSQL to simulate database-valued Markov chains (that is, chains whose state at each time tick is embodied by a relational database). There are many potential uses of this capability, one being that SimSQL can be used to perform distributed Markov Chain Monte Carlo (or "MCMC") simulations over very large data sets. MCMC is the standard inference method for Bayesian machine learning. In this talk, I will describe SimSQL's SQL dialect, and give examples of how it is very easy to use SimSQL to write Bayesian inference codes that are small and implicitly parallel. I will also describe some of the key implementation methods utilized by SimSQL to compile and execute parallel MCMC codes over large data sets.

Bio:
Chris Jermaine is an associate professor of computer science at Rice University in Houston, Texas. Chris' research is at the intersection of data management and applied statistics. He is the recipient of a Alfred P. Sloan Foundation Research Fellowship, a National Science Foundation CAREER award, and a SIGMOD Best Paper Award.