Monitorama PDX 2018, in Portland, offered an intense, three-day conference program -- by and for monitoring and DevOps practitioners.
You are here
DevOps Days Berlin 2017
DevOps Days is a popular series of events hosted across the world over the last 8 years. Boasting events from Auckland to Kiev, I recently had a talk accepted for the Berlin event on a topic with which I have had extensive experience – on-call engineering.
In a past life, before joining Opsview, I spent 8 years over 3 organisations working in some capacity of on-call responsibility. I have been a part of and run multiple on-call rotations, so I was excited to be given the opportunity to speak about the ways I’ve improved the oft-difficult task of out-of-hours issue management.
My talk focused on the journey I’ve taken with on-call and how reasoned and lightweight process can help a team to optimise their on-call responsibility. There are also elements as to how they can create a better interface with management and champion their needs yet deliver better services to the business they support. The talk itself was named “Getting paid to sleep – Trying to fix on-call.” I will, in the near future, write up elements of my talk with the goal of offering insight that will make other on-call rotations easier.
During the rest of the conference we heard a series of talks from a wide range of people. We heard from engineering leaders from major software vendors and software engineers from the ever-brilliant Government Digital Service. All of these talks offered interesting and exciting insight into the range of problems a DevOps-minded organisation can face and explored some excellent concepts:
The Pit of Success – It’s a Good Thing
Encouraging people to fall into the pit of success is a concept that was discussed at length on the day. The idea is to design your solutions to have the best action be the most obvious action. This is a concept that extends to functional architecture. Ideally, most things should be designed from this perspective. If you plan for the only option to be the best option, you’ll likely spend less time worrying about edge cases.
Culture of Fast Failure – It’s Also a Good Thing
Throughout the day, the benefits of keeping a culture of failure were celebrated repeatedly. This is a requirement for any agile team, department, or business where you seek to fail fast. It’s important to not fixate on the failure other than to understand what didn’t work and what path would potentially lead to a future success.
Modern Incident Management with Slack
With many businesses, the standard process for a major incident involves the idea of getting everyone onto a conference call and talking through the effects of the issue and the proposed steps to resolution. This can be a schedule and productivity intrusive process that can effectively tie the hands of the engineers you are relying on to tackle the problem. Looking at taking a more dynamic approach to the problem, progressive companies and DevOps cultures instead use asynchronous, real-time communications platforms like Slack to communicate throughout the incident. This allows engineers to focus on the problem while communicating out-of-band with the incident owners. It also provides a log for reference and creating incident summary reports.
DevOpsDays also has extensive break out space to allow for concept to be expanded on outside of the talks.
All-in, DevOps Days Berlin was really fun. I had a great time getting to know other industry experts and hearing about the everyday problems developers, operators, administrators, and engineers all face. My favourite comment from the event was another attendee commenting that it was like having 20 consultants in a room that could tackle each other’s problems collaboratively and quickly.
If you have your own questions about DevOps, please reach out to us today. We’re excited to talk about our own DevOps culture here at Opsview, and how it’s helped us develop Opsview Monitor.
More like this
So, last Friday night, I decided to turn my infrastructure into code by learning Ansible, and capture the entire demo configuration, so that, in...