Let's Process Information, Not Bits: Architecture's Expensive Data Movement
When: Wednesday, February 05, 2020
Where: PGH 232
Time: 11:00 AM
Speaker: Dr. Andrew A. Chien, The University of Chicago
Host: Dr. Lennart JohnssonFor more than a decade, CMOS technology scaling has continued to increase the cost of data movement relative to computation. Building on insights from the 10x10 project, we have designed a series of accelerator architectures that deal with complex representations (regex, automata, parsing, RLE, compressed) and transforming representation rapidly-- as much as 10,000 times better energy-delay product than conventional cores. These accelerators are flexibly programmable, yet outperform hardwired ASIC designs for regex and automata processing. We describe the properties of these architectures-- the Unified Automata Processor (UAP) and the Unstructured Data Processor (UDP) and their performance on regex and automata processing. The accelerators are small and can be easily added to a chip memory hierarchy or storage controller, accelerating many data transformation tasks >16x. Our most recent studies (ACCORDA) integrate these accelerators into the SparkSQL data analytics engine, exploring the power of flexible, cheap data transformation in query optimization and execution. We extend the software architecture, modifying the operator interface (subtype with encoding). ACCORDA enables a new class of encoding optimizations and robust high-performance raw data processing. We evaluate ACCORDA using TPC-H queries on tabular data formats, exercising raw data properties such as parsing and data conversion. The ACCORDA system achieves 2.9x-13.2x speedups, reducing raw data processing overhead to a geomean of 1.2x (20%). In doing so, ACCORDA robustly matches or outperforms prior systems that depend on caching loaded data, while computing on raw, unloaded data. ACCORDA’s encoding-extended operator interface unlocks aggressive encoding-oriented optimizations that deliver 80% average performance increase over the 7 affected TPC-H queries.
Dr. Andrew A. Chien is a William Eckhardt Distinguished Service Professor and Director of the CERES Center for Unstoppable Computing at the University of Chicago, as well as Senior Computer Scientist at the Argonne National Laboratory. In 2017, Dr. Chien became the 9th Editor-in-Chief of the Communications of the ACM. In 2015, he founded the CERES Center for Unstoppable Computing. From 2005 to 2010, Dr. Chien served as Vice President of Research at Intel Corporation. Previous academic positions include the SAIC Chair in Computer Science and Engineering, and founding Director of the Center for Networked Systems at the University of California at San Diego (1998-2005). While at UCSD, he also founded Entropia, a widely-known Internet Grid computing startup. From 1990 to 1998, Dr. Chien was a Professor of Computer Science at the University of Illinois at Urbana where he created the well-known Fast Messages, HPVM, and Windows NT Supercluster systems.
Dr. Chien is a Fellow of the American Association for Advancement of Science (AAAS), Fellow of the Association for Computing Machinery (ACM), Fellow of Institute of Electrical and Electronics Engineers (IEEE), and has published over 170 technical papers. His research has been recognized for excellence by numerous awards. Dr. Chien received his Bachelor's in electrical engineering, and Master's and Ph.D. in computer science from the Massachusetts Institute of Technology.