You are here

Blog

Avoid Data Disasters with Observability

There’s a lot being said about observability these days. Particularly, a lot being said about the difference between monitoring and observability. One of the things that isn’t being consistently highlighted is the importance of having both, and even better, why not get both in one place?

Monitoring is the systematic watchdog that ensures you know when things aren’t running well or aren’t running at all. Thoughtful monitoring has the operational intelligence built-in (like Opspacks) to determine what’s relevant for alerting and what’s not. Observability refers to the ability to see or figure out what’s causing the event that has triggered an alert, like the Investigate option in Opsview Monitor.

As seen in figure 1 below, the Opsview Monitor Host Navigator has the ability investigate directly from the monitored system.

Opsview Monitor Host Navigator

Data disasters are typically corruption of some sort or a total data loss. They are almost always prefaced by an event or series of events of which a monitoring implementation would have notified you. Let’s say you had the monitoring in place and got some sort of alerts. Without some ability to investigate (observe) the details of the alert, many admins and operators are tempted to dismiss the alert as not a high priority at the time.

Corruption can come from hard disks failing, bad power supplies, a multitude of issues with physical systems. Data can also be corrupted by misbehaving applications. Knowing when your CPU, RAM or disk are beyond thresholds can be the first preventative measure you take against data corruption.

There’s also the issue of outright data loss. That’s usually caused by a total system failure or a combination of events. If your monitoring system only monitors infrastructure, it might not be representing the whole picture of the status of your estate at a given time. The same is true if the monitoring system is an application monitoring system. There are a lot of great monitoring tools out there, but it’s important to have a monitoring platform that can unify all of the metrics and alerts you’re receiving, particularly during crisis.

There are management tools, as well, like Microsoft System Center Operations Manager, that have some monitoring capability. But, really, that sort of tool is designed for management, not for monitoring. The comprehensive view, the ability to monitor applications, infrastructure, network devices, all in one place, is what brings the proper level of detail to an event. The ability to investigate the event directly from within the monitoring platform is what gives you observability.

When you’ve received an alert, you’ll want to investigate the cause of the event. So, in the Event Viewer you can directly investigate both the host that generated the event, as well as the event itself (in Opsview terms, the Service Check). See Figure 2 below.

Opsview Monitor Service Check

Perspective, whether you pass the alert to someone else or take action yourself, is important. But most people would agree that the criticality of maintaining healthy data is not a matter of perspective, it’s mandatory. Ongoing data integrity is a matter of insights, alerts and being proactive. Your data isn’t the only thing affected, though. We’ll be talking more about the intersection of monitoring and observability in coming blogs, so be sure to subscribe and keep in touch.

If you’d like to review your current toolset or are interested in how Opsview Monitor can give you both unified insights as well as observability, contact us today, we love talking to people.

Get unified insight into your IT operations with Opsview Monitor

bbauman's picture
by Bill Bauman,
Head of Innovation and Strategy
My love for computers and technology started over 20 years ago. I found myself in a processor development lab. That was the catalyst for a whirlwind of technology related opportunities. I fixed broken servers at customer locations, designed complex systems, traveled the world talking about virtualization and system performance, helped build a public cloud program, and started telling the story of it all. I love how technology and humanity come together. I work to redefine what we consider system monitoring. I work with some really smart people on strategic direction here at Opsview. It's not just product, it's innovation. I think we're a little different here. A lot of us work in multifunctional roles, myself included. It's intense and it's fun. If you ever want to talk emerging technologies or the future of technology, I love a good conversation. I also love cycling, travel and hemp milk lattes. I call the world home, but most of my bikes live in Portland, Oregon, USA.

More like this

Feb 05, 2016
Whitepapers
By Opsview Team, Administrator

When choosing an Enterprise monitoring tool there are many considerations, but one that is almost always right at the top of the list is...

New Basics Tutorials on Kubernetes.io
May 18, 2018
Blog
By John Jainschigg, Technical Content Marketing Manager

Kubernetes’ extraordinary resilience tends to change the emphasis of monitoring from alerting to resource and performance management.

Mar 06, 2017
Blog
By Alex Burzynski, Product Architect

A full guide on using InfluxDB as a time series provider in the recently released Opsview Monitor 5.3.