Computer Science Seminar - University of Houston
Skip to main content

Computer Science Seminar

Algorithms for Large Data Analytics via Coresets and Sketches

When: Monday, March 7, 2016
Where: PGH 232
Time: 11:00 AM

Speaker: Dr. Jeffrey Phillips, University of Utah

Host: Prof. Gopal Pandurangan

For the last decade, many companies and scientists are generating enormous quantities of data, yet often do not have the facilities to properly collect, annotate, or analyze this data. An emerging approach towards this problem is to create coresets and sketches of that data. These are powerful summaries which can be efficiently maintained, and for important aspects of the data can be queried similar to the original data, but much more efficiently and with bounded error. Impressively, the sizes of the summaries depend only on the error in the approximation guarantees.

In this talk I will discuss my work in developing algorithms for coresets and sketches central to data analysis, as well as some of the broader computational and analytical consequences of working with them. I will focus on two classes of summaries. The first is a sketch for matrices, called Frequent Directions (FD). Matrix sketching is the most common preprocessing technique for many enormous data sets used in machine learning and data mining. FD is efficient and general to construct, and it provides the smallest size/approximation-error ratio for common error measures, an order of magnitude better than those based on random projections or random sampling.

The second is a coreset for kernel density estimates, the dominate way to model noisy spatial data. In this setting we also show significant improvements over simple random sampling approaches. Moreover, we show that this coreset can be used to preserve worst-case L_infty bounds necessary for preserving anomalous events, important spatial patterns, and even topological properties of the data.

Bio:

 Jeff M. Phillips is an Assistant Professor in the School of Computing at the University of Utah.  He works on large-scale algorithms and geometric data analysis. He is the Director of the Data Management and Analysis Track within the School, which oversees all data science related educational programs related to computing at the university.   Dr. Phillips is supported by several NSF grants, including an NSF CAREER Award. Before being on the faculty, he was on a 2-year NSF Computing Innovations Postdoctoral Fellowship also at the University of Utah. He received his Ph.D. in Computer Science (focusing on Algorithms, Data Mining, and Computational Geometry) at Duke University in 2009, while on an NSF Graduate Research Fellowship. And he completed his undergraduate degree in Computer Science and in Math at Rice University in 2003, where he developed a life-long love for TexMex and barbecue.