When choosing an Enterprise monitoring tool, there are many considerations. One that is almost always at the top of the list is scalability.
You are here
How to Build a Bullet-Proof* On-Prem Monitoring Server
Setting aside the debate of Cloud-based vs on-premises for your IT monitoring system, we’ll discuss the steps you can take to build a resilient monitoring server in your data center.
There are many considerations when installing a monitoring server locally. We will discuss some of the steps you can take with the goal of improving the chances that your monitoring server is available during a service outage at your organization.
Name ResolutionTake the steps to add local host file entries of your database server or other monitoring servers. This should include forward and reverse name resolution. You can also add entries for key infrastructure like core routers, critical app servers, etc. Beware though, if there are any changes to the addresses you will need to update your hosts file. Another option is to use IP addresses rather than hostnames to avoid relying on your DNS infrastructure, although this can have the disadvantage that it’s easier to maintain or end up monitoring the ‘wrong’ device.
Shared InfrastructureFor a resilient monitoring server, local storage is much preferred. Shared storage such as SAN, (No SAN or shared storage), local RAID array, HDD, SSD, Hybrid, or anything locally installed.
PlatformIf possible, using a physical server can be a life saver when mysterious and unexplained outages occur with the virtualization platform. This request may seem at odds with the goals of your IT organization, but the cost savings on virtualizing your monitoring server can be far outweighed by the impact of a single undetected outage. As an alternative, some protection against failure could be added if you are required to use a virtual server. Some options could include: OS-level HA across multiple physical hosts, application-level redundancy such as DR sync, and database copies.
DatabaseUsing a shared database can lead to unintended consequences like patching, updates and upgrades in support of other applications using the same database. Much like virtualization, other issues can arise from noisy neighbors who like to use more than their fair share of resources. A database dedicated to the monitoring application is preferred.
Local CollectorWhen the network fails, a local collector installed in your monitored environment can continue to process events from agents or agentless and save to a buffer when connectivity is restored. Some local collectors can also perform actions, notifications, and fix-it scripts while disconnected from the mothership.
BackupsIn addition to centralized backup software solutions, make sure you have copies of any configurations including users, dashboards, searches, etc. You can save these on a shared drive, but a properly paranoid monitoring administrator will also save to other destinations that are infosec approved.
Local User AccountsLDAP and AD can fail and prevent you and your users from accessing the monitoring consoles. A good compromise is to have a backup set of locally authenticated accounts you and your users can revert to when shared authentication services fail.
*All software fails, the goal is to fail at a different time from your monitored infrastructure.
More like this
If you're a dissatisfied Nagios user who is ready to make the switch to Opsview, here is a guide on how to execute a migration that will result in...
In this technical overview we will look at automation and monitoring, and how they can be deployed to work hand-in...