Monitoring for MSPs - Devices per customer
Building on my previous blog on SLA monitoring with Nagios, which looks at measuring availability at a “group” level where a group is a customer or department, I next want to look at another feature that a lot of MSPs will want to know and differs from their standard view.
How many devices per customer?
In other words; "For how many managed devices should I charge the customer?"...or..."At the end of the month, how much revenue can I expect to book?"
At the moment in Opsview we can look at the number of hosts in a host group using a view similar to below:
Which is all well and good – we can see we have six “Databases” (Oracle, MySQL, and so on) and three devices that live in the “Customers” host group. However we can’t alert based upon these values, nor can we graph on them.
Until now! *dun dun dunnnn*!!
Using my highly limited development skills, (asking people “how do you do this?, etc.”), I’ve created a plugin called check_opsview_hostcount. This takes in the host group and outputs the number of hosts that live in that group in Nagios ‘perf data’ format. This is important as we can then use the output in a graph. This means we can now graph, report and more, based upon the host group total.
What we can also do, is the “warning/critical” flags (-w/-c) to alert us when we are over a certain number. This can be particularly useful if you want to know that your MSP will charge you twice as much if you go over 1000 hosts, for example. By using this new plugin, you can now be alerted!
Bringing it together
So, here are a few cool things we can now do here because of this plugin.
1. Show host counts as service checks
By using the “Multiple attributes” option in Opsview, I can create one service check called “Hosts:” and have it use a syntax of:
check_opsview_hostcount --hostgroup "%HOSTGROUPATTR%"
So that each time I add an attribute to the host against which this check is applied, a new service check is already added. What does this look like? Well, I only created one check, and I only apply to my host once (hosts – as below):
However each time I add an attribute, it will create a new check - i.e. because I have added seven attributes to my “dummy host” of the type “HOSTGROUPATTR” with the values of my host groups:
I now get seven checks created as below - this will work with any service check scenario, on any version of Opsview:
This makes my setup very neat and tidy. Each time I want to modify the check, I edit it once, not 10,000 times.
2. Show my results in a pie chart
Here we can see the split of “customers vs. total” – so if we have one customer using up a lot of your host count, then maybe it’s time to offload them to a separate system or dedicated server.
3. Use performance gauges
One of my personal favourites is the performance gauge. Here we can set our thresholds so that if a customer’s host count goes above eight it’s a warning, and above nine it’s critical. This allows us another great “at a glance” view into our operations:
4. Bring it all together
Finally, we can bring together our...
• Hosts count – How many devices are we monitoring for each customer?
• Host group SLA – What’s the availability for our customer, for the duration given (1 week, 2 weeks, etc.)?
• Pie Chart – Of our 100 monitored devices, how many are from a single customer? Also, what did our customer breakdown look like 1 month ago, 1 year ago and so forth.
Conclusion and notes
So there we have it. We can now monitor SLAs, look at customer device count and create alerts it if it gets too high, as well as displaying it graphically to use in reports. I think these are really rather useful tools for MSPs to have in their arsenal.
Note: If you want a copy of the plugin, feel free to leave a message / comment.