You are here

Distributed Monitoring

Opsview 6 System Diagram

Overview

Distributed Monitoring is a feature of Opsview that allows checks and notifications to be executed from remote servers, thus giving the capability of scaling your Opsview system by spreading the load and reducing latency. This is useful when:

  • You have a large number of Hosts
  • You have several datacenters that span across different geographic locations
  • You have networks that have secured zones or firewall rules to segregate Hosts

Opsview uses Collectors to handle the execution and collection of results.

For additional failover and load balancing capabilities, Collectors may be grouped together to form a Cluster.

There should always be an odd number of nodes within a collector cluster; 1, 3, 5, etc. This is to help with resiliency and to avoid split-brain issues

Each Host in Opsview is assigned to a Cluster. The Host will be actively checked by any Collector in that Cluster.

Note: The first Cluster with a Collector that is registered into Opsview in a new system will assume the role of ''Master Cluster'. Using the Advanced Automated Installation, the 'Master Cluster' forms part of the 'Opsview Primary Server'.

To setup additional Collectors, you need to:

  • Install the software
  • Register it in Opsview
  • Assign it to a Cluster.

These steps are detailed below in Managing Collector Servers.

Opsview Orchestrator

The host where Opsview is first installed is called the Opsview Primary Server. This host has all the necessary software packages installed so that it can function as a single Opsview system, but you can separate out the functional components onto other hosts to spread the load and decrease latency.

The Opsview Primary Server will have the following Host Templates assigned to it:

  • Application - Opsview
  • Opsview - Component - Agent
  • Opsview - Component - Autodiscovery Manager
  • Opsview - Component - BSM
  • Opsview - Component - Datastore
  • Opsview - Component - Downtime Manager
  • Opsview - Component - Executor
  • Opsview - Component - Flow Collector
  • Opsview - Component - Freshness Checker
  • Opsview - Component - License Manager
  • Opsview - Component - Load Balancer
  • Opsview - Component - Machine Stats
  • Opsview - Component - MessageQueue
  • Opsview - Component - Notification Center
  • Opsview - Component - Orchestrator
  • Opsview - Component - Registry
  • Opsview - Component - Results Dispatcher
  • Opsview - Component - Results Flow
  • Opsview - Component - Results Forwarder
  • Opsview - Component - Results Live
  • Opsview - Component - Results Performance
  • Opsview - Component - Results Recent
  • Opsview - Component - Results Sender
  • Opsview - Component - Results SNMP
  • Opsview - Component - Scheduler
  • Opsview - Component - SNMP Traps Collector
  • Opsview - Component - SNMP Traps
  • Opsview - Component - SSH Tunnels
  • Opsview - Component - State Changes
  • Opsview - Component - TimeSeries Enqueuer
  • Opsview - Component - TimeSeries
  • Opsview - Component - TimeSeries RRD
  • Opsview - Component - TimeSeries InfluxDB
  • Opsview - Component - Watchdog
  • Opsview - Component - Web
  • OS - Unix Base
  • Network - Base

This host is automatically assigned to the Master Cluster, and will normally monitor itself.

To add, remove, register clusters and collectors, see Managing Collector Servers.

Troubleshooting

The most common problem relates to misconfiguration of Components requiring access to the Master MessageQueue Server - Scheduler and Results-Sender. Check /var/log/opsview/opsview.log for detailed errors.

Architecture

Opsview Scheduler is the main component of a Collector. It receives commands and configuration from the Orchestrator and schedules execution of monitoring plugins, event handlers and notification scripts.

The execution of plugins is performed by Opsview Executor, whose only job is to execute commands requested by Scheduler. Results are then sent back to the Opsview Scheduler who requested those commands.

This approach allows sharing multiple Opsview Executors among all Collectors of a given Cluster - Point all Components to the same Cluster MessageQueue, and automatic load-balancing will be available.

Opsview Scheduler sends the results to Opsview Results-Sender, which will forward them to the Results Processors. In the case of a network outage, the Results-Sender will hold the results for a configurable amount of time.

Scalability

For high-availability we recommend you to have a single monitoring Cluster per monitored location (e.g. datacenter) with as many Collector nodes as required. All Collectors should point to single Cluster MessageQueue Server. For more information and assistance, contact our Customer Success Team.

Security

To secure communication over the network please refer to the Securing Network Connection documentation here.

Failure Scenarios

Opsview 6 can handle n-1 Collector failures within a Monitoring Cluster and since there is no upper limit on the number of Collectors in Cluster, we recommend you have at least three nodes per Cluster. If there is a Collector failure, the Orchestrator will detect this within 60 seconds and automatically re-assigns the hosts monitored by that failed Collector to the remaining Collectors of the Cluster. The re-assignment will use the current known state of the objects and the configuration of the last time you have performed an Apply Change from the Configuration menu. Re-assigned hosts and their services are instantly re-checked.

When the Collector recovers, the Orchestrator would also automatically re-assign the hosts back again.

How distributed monitoring improves scalability

Engineering Resources

Opsview Customer Success
By Jay Altavesta, US Customer Success and Support Manager
With Opsview's customer support, what we do and say will have a lasting impact on the success of our customers' monitoring projects and on their...
Results Exporter lets you filter outbound messages
By Owen Jenkins, Technical Intern
Opsview Results Exporter outputs messages to log servers and analytics platforms. Learn how you can filter those results for lower data volume.
Conway's Game of Life in one line of APL - one of the hardest-to-read programs ever written
By James Luckett, Graduate Software Engineer
A consistent coding style makes production code more readable and usable. Here are some tools that Opsview uses to keep our Python and Perl codebases...
Results Exporter lets Opsview 6.1 users integrate with Splunk and other analytics platforms
By Owen Jenkins, Technical Intern
The Results Exporter component makes it easy to export your results out of Opsview for ingestion into analytics applications like Splunk. 
A new architecture lets Opsview Monitor 6.0 scale to millions of service checks
By John Jainschigg, Technical Content Marketing Manager
Opsview 6's new architecture dramatically enhances performance, scalability and resilience while preserving 100% Nagios® plugin compatibility.
Microservices let Opsview Monitor 6.0 handle tens of thousands of hosts and millions of service checks
By Rob May, VP Engineering
Opsview Monitor 6 breaks scale benchmarks -- up to 50,000 hosts and over 1 million service checks on a six-node cluster.