You are here

Blog

Notes on a New Architecture

Opsview Monitor 6 vastly improved scale and performance

In addition to adding new monitoring capabilities, a main focus of Opsview Monitor’s development has been to improve performance and deliver greater scalability, ensuring that customers continue to enjoy a great user experience as they expand their use of the product across their growing and changing infrastructure environments.

Increasingly, customers are demanding that monitoring tools work at significant scale.  Rather than monitoring thousands or tens of thousands of hosts (endpoints), large enterprises and MSPs now may need to monitor hundreds of thousands of hosts on a single system.  To deliver on this expectation and to also deliver best-in-class performance to customers operating at smaller scales, Opsview has created an entirely new architecture for its latest release of Opsview Monitor, 6.0.

This new architecture has allowed for the Nagios® engine to be completely removed, while still retaining full compatibility with Nagios plugins.  This provides backwards compatibility with plugins for existing Opsview Monitor users so that a smooth upgrade path is assured, and also means that customers are still able to use any of the thousands of plugins available from the community.

Microservices

For maximum flexibility, scalability and performance, Opsview Monitor 6.0’s new architecture is based on microservices, where a number of separate components (processes) perform discrete tasks. Instead of linking these microservice processes using sockets or classic inter-process communications (IPC) methods, we connect them via a message bus: a high-performance, abstract communication system with built-in resiliency features, that’s performant, robust, and easy to use in highly-dynamic environments.

Detailed diagram of Opsview Monitor 6.0 Microservices

 

Each component is managed with its own local configuration file and can be run alongside other components on a single server, or scaled out horizontally across multiple servers. Components which affect scalability, such as those which are executing plugins to collect monitoring data, can be duplicated as many times as is required in order to deliver greater throughput and better utilize underlying resources. Further performance and predictability gains can be made by grouping components of the same type to create function-dedicated servers.

RabbitMQ and CouchDB

RabbitMQ has been used to provide reliable, scalable messaging across the system, including end-to-end encryption both in transit and at rest, and was chosen over alternatives such as Kafka mainly due to its simplicity of configuration, particularly in clustered environments.  An additional benefit of using a queue-based tool for messaging is that integration with third party systems becomes even easier than in earlier releases of Opsview Monitor.  Now, a few lines of code can create a queue listener which will pass live monitoring data such as check results or alerts directly into a data lake or analysis tool which may already be in place in your environment.

CouchDB is used to provide a fast and resilient NoSQL data store which can be used by Opsview Monitor to store live monitoring state and other runtime data needed by the various system components.  All data exchanged over the message bus is JSON-formatted, and so a datastore which speaks JSON natively is an ideal choice, especially as it also ensures fault tolerance in a highly distributed system.

Even though the ability to scale horizontally exists due to the microservices-based architecture, a smaller system monitoring only a few thousand hosts could be run on perhaps just three or four servers – a database, an orchestrator server (where the web UI can be found), and two collector nodes for running the checks.  This is the same architecture as would be used on earlier versions of Opsview Monitor -- highlighting the fact that microservices do not necessarily create additional complexity.

In larger environments, scaling to tens of thousands of hosts and beyond is best achieved using several separate physical or virtual servers, which can be bare metal or virtual machines, on-premises or in a public or private cloud. Figure 2 shows an architecture diagram for a larger system, monitoring 10,000 hosts.  There are many variables to consider when architecting a large system, and this diagram is just an example of one possible solution.  Adding additional collector clusters, schedulers and executors would allow this system to be scaled much further still.

Figure 2. An example deployment of Opsview Monitor 6.0, used for benchmarking.

 

Deploying and managing all these components, plus additional software, may sound complex, but in Opsview Monitor 6.0, these tasks don’t need to be challenging.  Since early in the development of this new architecture, Opsview has used Ansible – a leading open source IT automation framework – for Opsview Monitor lifecycle management. It is now possible to deploy and configure the entire system through Ansible playbooks which are provided out of the box.  Components are installed, configured and connected to each other without manual intervention, greatly reducing both the learning curve and the risk of manual configuration errors being introduced.

Whatever the size of your IT environment, Opsview Monitor 6.0’s new architecture can scale to meet your needs.  With fast, automated deployment and comprehensive integrations with tools such as configuration management and service desks, getting broad and deep monitoring coverage of your critical business systems is easier than you think.

Get unified insight into your IT operations with Opsview Monitor

nferguson's picture
by Neil Ferguson,
Director of Customer Success
Neil is responsible for ensuring that all of our customers receive the best possible service from us, and realise the maximum value from their Opsview Monitor subscription. Whether it’s looking after our infrastructure, managing the Customer Success Teams or visiting our customers and prospects in person, Neil is involved in many aspects of Opsview’s business. Outside of work he is a home improvement and gardening fanatic as well as an amateur home brewer and private pilot, and is always busy with hobbies or spending time with his family.

More like this

Don't Monitor Yourself into a Madhouse 2
Nov 26, 2018
Blog
By John Jainschigg, Technical Content Marketing Manager

Done right, IT monitoring provides clarity and promotes operational effectiveness. Done wrong, it can make your staff crazy and limit business...

Oct 13, 2016
Blog
By Alex Burzynski, Product Architect

A detailed guide on how processing time series works to your advantage in Opsview 5.2. 

Apr 08, 2016
Blog
By Opsview Team, Administrator

A guide on how to execute a smooth transition from Nagios to Opsview.