You are here

Blog

What we ‘observed’ at Monitorama PDX 2018

Monitorama PDX 2018 was a great show, produced by a great community

Held in Portland, Oregon (USA) at the Portland Armory’s Gerding Theatre, June 4-6, Monitorama PDX offered an amazingly intense, well-curated three-day program of technically- and/or philosophically-deep and critically-relevant presentations about monitoring (and related topics), by and for expert practitioners.

Monitorama’s roots are in open source monitoring, and although (like Percona Live) they’ve expanded sponsorship and show-floor to include select commercial and SaaS products and technologies, they’re still robustly open source, and practitioner/attendee- rather than vendor-centric at heart. They try to keep the speaking roster diverse and to make people from all over feel included.

We’ve prepared a breakdown of the program, with capsule descriptions of the sessions. Click on the links for high-quality (captured livestream) video of the sessions you want to view.

Monitorama PDX 2018 Highlights - Day 1

31:23 - Logan McDonald, an SRE at BuzzFeed offers a deep personal and organizational look at how to master new monitoring technologies, concepts, and organizational requirements. If you’ve newly-joined an SRE team or are responsible for bringing up new hires to the point where they’re ready to join an on-call rotation, this is a wonderful, eye-opening guide (and Tip #7 will surprise you!)

1:00:58 - Pam Selle, lead engineer at IOpipe, opened her presentation (called “Serverless and CatOps”) by invoking cats as symbols for IT infrastructure (a notion echoed by Opsview’s demo), then went on to apply this meme to explain some of the benefits and constraints of serverless computing and describe a framework (IOpipe) for wrapping AWS Lambda functions for monitoring and observation.

2:05:05 - Zach Musgrave and Angelo Licastro of Yelp offered a tag-team short course in how to mentor metrics engineers and grow the mandate of a monitoring team, best practice for consulting with other teams, and helping an entire organization reach its business goals faster.

2:49:49Dawn Parzych,  Technology Marketing Director at Catchpoint, presented on The Power of Storytelling -- providing a guide for monitoring engineers on how to communicate, interrogate, make sense of what you’re hearing from stakeholders, help them learn from you, and persuade them.

4:51:48 - Jamie Wilkinson, SRE at Google, gave a talk called (in a nod to Isaac Newton) “Principia Slodica (A Treatise on the Metrology of Service Level Objectives).” A highlight of this talk (for me, anyway) is when Wilkinson discusses where and exactly how alerts need to be generated for this kind of ‘low alert fatigue/highest-value alert response’ system to work. Wilkinson also talks about detailed monitoring specifications as a form of technical debt, which is eye-opening.

5:40:26 - Franka Schmidt, from open source mapping/geo project MapBox, offered a look inside how her organization onboards new monitoring engineers to their on-call rotation, and how she’s gamifying the process.

6:36:32 - Peter Bourgon, of edge cloud platform provider Fastly, offers a ton of actionable technical tips for instrumentation and documentation, plus clues for how to work as an SRE in an org that favors engineering autonomy.

7:09:37 - Aditya Mukerjee, Observability Engineer at Stripe, gave a very funny (in places, and quite grim in others) talk, discussing the effect of alert fatigue in clinical healthcare systems, then applying this info to IT monitoring.

7:44:19Ian Bennett, Software Engineer with Twitter Observability, filed his talk, titled “Monitory Report,” via pre-recorded video because he’d just gotten married. He describes Twitter’s recent long migration to a new monitoring system and regime, and concludes by discussing Sekhmet, Twitter’s new open source toolkit for diagnosing system health in realtime.

Monitorama PDX 2018 Highlights - Day 2

17:28 - Kishore Jalleda - SRE lead at Microsoft, discussed (mostly) his adventures at prior gigs, notably at Zynga, where he was the first to implement a modern SRE program with error budgets, penalties, and distributed ownership of monitoring and alerting. His talk focuses in part on creating the right incentives for cooperation.

44:37 - Peter Bailis, Assistant Professor of Computer Science at Stanford, introduced DAWN (Data Analytics for What’s Next) -- a project/framework for creating dynamic, task-optimized JIT toolchains of filtering, machine learning and other types of agents to perform given analytic tasks with orders-of-magnitude greater efficiency than conventional solutions. Bailis concludes his talk by discussing how DAWN technologies can, in principle, be applied to recognizing and alerting on conditions in planet-scale metrics deployments.

1:47:43 - Aruna Sankaranarayanan, who works in Bangalore with an otherwise mostly US-based incident response team at Mapbox, gave a great presentation about how she and colleagues have wired up Lambda with other components to create Slack bots that help coordinate conversations about issues, provide context, drive background ticket systems and perform other services.

2:28:16 - Andy Domeir, Director of System Operations at SPS Commerce, talked about automating context as a way of improving confidence in ability to resolve incidents and make decisions. His talk invokes the work of Edward Tufte and other dataviz experts, and closes by demonstrating an interactive network traffic visualization based on Netflix’s Vizceral WebGL animation framework.

4:47:44 - George Luong, Visibility Infrastructure Engineer at Slack (in his first conference talk, ever!) discussed how and why Slack adopted and how they now use Prometheus.

5:35:47 - Tapasweni Pathak, engineer on MapBox’s Platform Team, talked about how her team built a system called Sparky (actually ‘Sparky the Fire Dog’) to help manage, contextualize, and respond to alerts, using Amazon Simple Notification Service (SNS), node.js and similar tools.

6:32:09 - Megan Kanne, Engineer at Twitter, talked about how to automate analysis of canary deployments -- a great way of avoiding unanticipated issues with rollouts.

Monitorama PDX 2018 Highlights - Day 3

28:52 - Morgan McLean is PM for event tracking, code profiling and other things at Google. He  spoke about the OpenCensus project -- a project for instrumentation and telemetry based on Google’s internal Census system. OpenCensus lets anyone instrument their code to produce distributed traces, tags, time-series metrics, and extract logs.

1:02:23 - Yan Cui, Principal Engineer at London-based sports streaming host DAZN, spoke about the future of serverless observability. He envisions future tools enabling parallel tracing and debugging of many asynchronous components using time-series data, graph queries, and other analytic and introspective paradigms.

2:00:38 - Prateek Rungta, Engineer on Uber’s M3 Team, describes Uber’s mostly home-grown metrics and monitoring system, parts of which are now being open sourced.

2:49:47 - Mercedes Coyle, Software Engineer at Sensu, discusses the thrills and chills of building and testing a popular open source monitoring tool. She offers many best-practice take-aways that will help you avoid gotchas with popular techniques and platforms.    

5:00:32 - Allan Espinosa, SRE at Bloomberg.com and author of Docker High Performance and the video Deploying and Running Docker Containers, did a compelling talk on autoscaling containers using control theory math (this is the math of feedback loops, automobile engine governors, A/C temperature controls and similar stuff).

5:32:19 - Kale Stedman, of planet-scale gaming service provider demonware (subsidiary of Activision/Blizzard), talks about how they tried to build an auto-remediation system, but then found better solutions that leverage smart people.

6:45:37 - Dave Cadwallader, SR Architect at sequencing and precision-medicine org DNAnexus (and founder of the TestArmada project, while he was at WalMart Labs) talked about using observability to measure and provide security and compliance.

7:17:30 - Beth Cornils, Product Manager for Terraform at Hashicorp (they also make Consul, Vault, and other things) closed out the Monitorama PDX 2018 program with a high-engagement speech about diversity and inclusion. Very appropriate as the capstone of this unique show, which so obviously stressed diversity in its planning (look up at all those female speakers, for starters), for this unique monitoring/SRE/DevOps industry, which is showing the way for tech to be better, every day.

 

 

Get unified insight into your IT operations with Opsview Monitor

jjainschigg's picture
by John Jainschigg,
Technical Content Marketing Manager
John is an open cloud computing and infrastructure-as-code/DevOps advocate. Before joining Opsview, John was Technical Marketing Engineer at OpenStack solutions provider, Mirantis. John lives in New York City with his family, a pariah dog named Lenny, and several cats. In his free time, John enjoys making kimchi, sauerkraut, pickles, and other fermented foods, and riding around town on a self-balancing electric unicycle.

More like this

Monitorama Convention
Jun 04, 2018
Events
By Tom Callway, VP Marketing

Join us at Monitorama 2018 PDX to hear talks from industry experts and community leaders discuss the newest approaches in monitoring and...

compagnie theater
Sep 04, 2018
Events
By Tom Callway, VP Marketing

Monitorama is finally returning to Europe! Join us again to hear talks from industry experts and community leaders about the newest approaches in...

Tomas Ulin, Oracle VP Development for MySQL, at Percona Live 2018
May 09, 2018
Blog
By John Jainschigg, Technical Content Marketing Manager

Opsview's Bill Bauman and John Jainschigg attended Percona Live 2018 -- to talk about serverless computing, database monitoring, and catch up with...