DevOps is about accelerating delivery of new products and services at scale, reliably and affordably. Doing this requires IT operations automation...
You are here
Free Monitoring Solutions: Even Simple Stuff is Hard to Monitor
Fitting out a free monitoring engine for productive daily use can entail choosing among community-provided WebUIs, rest APIs, reporting frameworks, database back ends and other components. Scaling to meet enterprise monitoring requirements (assuming that’s really possible) can mean following user-provided recipes for implementing a distributed architecture, enabling clustering and failover, and integrating with other tools.
As noted in our last blog, the result can easily become a “special snowflake”: a fragile, dependency-ridden assemblage that you’re now responsible for documenting, maintaining, and using to meet the needs of your fast-changing business.
Owning a snowflake solution usually comes with some downsides. With nobody (but you, you lucky technologist) curating compatibility of components across versions and resolving breaking changes, updates can become forbiddingly complex, time-consuming, and risky. But foregoing updates to ensure continuous solution availability is a bad strategy, preventing access to innovation, and guaranteeing that the security of your solution will gradually erode.
Meanwhile, there’s the day-to-day to consider. How do you get all your infrastructure monitored?
Free solutions may offer limited options for ingesting configuration data. Enterprise IT organizations tend to be pretty disciplined: using Configuration Management Databases (CMDBs) and other IT Ops Management tools to document and enforce standards, support efficient IT processes, and keep reliable track of how IT infrastructure is deployed. Availability of comprehensive, correct, and fully up-to-date configuration data should give these organizations a massive edge when it comes to setting up monitoring. To avoid tedious research and error-prone manual configuration entry, just ingest the CMDB directly. Simple!
Except it’s often not so simple. Users of popular free monitoring solutions report limited availability of project- or community-provided tools for exchanging data with leading enterprise CMDB and ITOM platforms. Available integrations are often one-offs -- providing compatibility between particular versions of monitoring solution and Ops Management platform, but incompatible with later versions. Organizations with access to a CMDB may be forced to create (or hire consultants to create) a bespoke integration with their monitoring platform: always a time-consuming process, and one that results in creating one more piece of “snowflake” software to maintain in-house.
Automation may be 100% “roll your own.” Ideally, automation can be used to provide a so-called “closed loop” monitoring process, where infrastructure configuration data, access-related secrets (e.g., SSH keys) and other information is ingested, necessary agents are automatically installed and configured on hosts, plugins are installed on the monitoring platform to permit retrieval of metrics from different types of infrastructure and applications, and desired service checks, alerting thresholds, and other information are configured. The closer you can get to closed-loop automation, the more reliable, scalable, and generally useful your monitoring will be. Conversely, anything that prevents or complicates the process, or forces you to perform large parts of configuration manually, will slow you down and introduce errors.
Full closed-loop automation is hard to do: requiring integration of monitoring platform with CMDB, plus further integration with deployment tools (e.g., Puppet, Chef, Ansible, etc.) and between those tools and the monitoring platform’s REST API or other, more idiomatic provisioning mechanisms. Free monitoring tools seldom if ever offer a complete integration suite supporting closed-loop automation, meaning that you’ll likely need to roll your own from scratch (or budget time for manual configuration, both initially and then each time your infrastructure changes). The job can be made more difficult if your chosen monitoring solution hasn’t committed to, and fully documented, a single REST API standard.
Options for discovery and discovery-facilitated monitoring may also be limited. Free monitoring tools may or may not provide basic auto-discovery of Linux and/or Windows hosts, plus wizards enabling quick, point-and-click configuration of monitoring for discovered resources. Often, discovery and configuration-wizard tools are provided separately and tend to be extremely simplistic. Using them requires integration or chaining and careful operation by highly-trained IT experts.
Community-provided plugins for accessing metrics are likely available, but scope and quality may vary (and can be hard to judge without experiment). Free monitoring tools with large user communities may accrue hundreds or thousands of community-provided plugins enabling retrieval of metrics from on-premises and cloud-based infrastructure and applications. Quality and currency of these uncurated solutions, however, is not guaranteed. Solutions may not be documented or maintained assiduously. They may not come from trustworthy sources -- a real consideration, since plugins will run inside your network and may (unless you resolutely follow best-practice rules to prevent this) be provided with access to your infrastructure at fairly high permission levels.
Configuring service checks can be complex and require lots of domain knowledge. Just installing a plugin may let you retrieve a range of metrics from a given resource. But knowing which metrics to focus on, how to display and interpret them, and how to alert on them productively requires specialized understanding -- knowledge that not every member of your IT team will possess. Configuring all these details means many further manual steps, or more time-consuming work on home-grown automation.
More like this
Done right, IT monitoring provides clarity and promotes operational effectiveness. Done wrong, it can make your staff crazy and limit business...
Part one of a series objectively examining important topics in contemporary data center monitoring, including observability, automation, and cost...