Monitoring Your Data Center Like a Google SRE

Monitoring Your Data Center Like a Google SRE

John Jainschigg explains how to monitor your data center like a Google Site Reliability Engineer
Apr 23, 2018

Back in the early ’00s, when Google was beginning to expand its portfolio of services beyond search, it encountered a combination of challenges. Some of these emerged from familiar, classic disconnects between developers and operations folks or between IT services and line-of-business owners. Others were brand new, never-before-seen failure modes that arose from providing services on novel cloud platforms—and doing so at planetary scales.

To confront these challenges, Google began evolving a discipline called Site Reliability Engineering (SRE), about which the company published a very useful and fascinating book in 2016. SRE and DevOps (at least the contemporary version of DevOps that’s expanded into a vision for how IT operations should work in the era of cloud) share a lot of conceptual and an increasing amount of practical DNA; particularly true since cloud software and tooling have now evolved to enable ambitious folks to begin emulating parts of Google’s infrastructure using open source software such as Kubernetes. Google has used the statement, “Class SRE implements DevOps” to title a new (and growing) video playlist by Liz Fong-Jones and Seth Vargo of Google Cloud Platform, showing how and where these disciplines connect and nudging DevOps to consider some key SRE insights.  

Continue reading this article >

More like this

DevOps in Desperation - Did Someone Say Ansible?
Blog
By John Jainschigg, Technical Content Marketing Manager

So, last Friday night, I decided to turn my infrastructure into code by learning Ansible, and capture the entire demo configuration.

New Basics Tutorials on Kubernetes.io
Blog
By John Jainschigg, Technical Content Marketing Manager

Kubernetes’ extraordinary resilience tends to change the emphasis of monitoring from alerting to resource and performance management.

Bill Bauman introduces Opsview WSLTools at Monitorama PDX 2018
Blog
By John Jainschigg, Technical Content Marketing Manager

Monitorama PDX 2018, in Portland, offered an intense, three-day conference program -- by and for monitoring and DevOps practitioners.