Shrinking Production Incidents
When: Monday, March 2, 2020
Where: PGH 563
Time: 11:00 AM
Speaker: Annalee Nagami, Google
Host: Dr. Omprakash Gnawali
“Hope is not a strategy” - Google Site Reliability Engineering motto
For large-scale systems, the question is not whether something will go wrong, but when. Site Reliability Engineers manage this risk. This talk will outline strategies for detecting problems, mitigating their effects, shortening their duration, and reducing their frequency.
Annalee Nagami is a Site Reliability Engineer at Google. Her job is to keep the account management infrastructure highly available. Annalee graduated from the University of Houston in 2011 with a Bachelors of Science in Computer Science. She started her career working in Compiler Support at Intel. In 2014, Annalee joined Google to develop internal tools. She built integration testing infrastructure and led initiatives to standardize best practices for correctness testing. In early 2019, she joined SRE to pursue her interest in building large-scale production systems.