You are here

How to Build a Bullet-Proof* On-Prem Monitoring Server

Setting aside the debate of Cloud-based vs on-premises for your IT monitoring system, we’ll discuss the steps you can take to build a resilient monitoring server in your data center.

There are many considerations when installing a monitoring server locally. We will discuss some of the steps you can take with the goal of improving the chances that your monitoring server is available during a service outage at your organization.

  1. Name Resolution

    Take the steps to add local host file entries of your database server or other monitoring servers. This should include forward and reverse name resolution. You can also add entries for key infrastructure like core routers, critical app servers, etc. Beware though, if there are any changes to the addresses you will need to update your hosts file. Another option is to use IP addresses rather than hostnames to avoid relying on your DNS infrastructure, although this can have the disadvantage that it’s easier to maintain or end up monitoring the ‘wrong’ device.
  2. Shared Infrastructure

    For a resilient monitoring server, local storage is much preferred. Shared storage such as SAN, (No SAN or shared storage), local RAID array, HDD, SSD, Hybrid, or anything locally installed.
  3. Platform

    If possible, using a physical server can be a life saver when mysterious and unexplained outages occur with the virtualization platform. This request may seem at odds with the goals of your IT organization, but the cost savings on virtualizing your monitoring server can be far outweighed by the impact of a single undetected outage. As an alternative, some protection against failure could be added if you are required to use a virtual server. Some options could include: OS-level HA across multiple physical hosts, application-level redundancy such as DR sync, and database copies.
  4. Database

    Using a shared database can lead to unintended consequences like patching, updates and upgrades in support of other applications using the same database. Much like virtualization, other issues can arise from noisy neighbors who like to use more than their fair share of resources. A database dedicated to the monitoring application is preferred.
  5. Local Collector

    When the network fails, a local collector installed in your monitored environment can continue to process events from agents or agentless and save to a buffer when connectivity is restored. Some local collectors can also perform actions, notifications, and fix-it scripts while disconnected from the mothership.
  6. Backups

    In addition to centralized backup software solutions, make sure you have copies of any configurations including users, dashboards, searches, etc. You can save these on a shared drive, but a properly paranoid monitoring administrator will also save to other destinations that are infosec approved.
  7. Local User Accounts

    LDAP and AD can fail and prevent you and your users from accessing the monitoring consoles. A good compromise is to have a backup set of locally authenticated accounts you and your users can revert to when shared authentication services fail.

*All software fails, the goal is to fail at a different time from your monitored infrastructure.

Get unified insight into your IT operations with Opsview Monitor

More like this

Choosing between on premises and off premises
Whitepapers

When choosing an Enterprise monitoring tool, there are many considerations. One that is almost always at the top of the list is scalability. 

Nagios vs the competion
Blog

If you're a dissatisfied Nagios user who is ready to make the switch to Opsview, here is a guide on how to execute a migration that will result in...

Whitepapers

In this technical overview we will look at automation and monitoring, and how they can be deployed to work hand-in...