You are here

Monitor but exclude from alerts

6 posts / 0 new
Last post
mobiusnz's picture
Monitor but exclude from alerts

We've got a few 'busy' hosts that use 100% cpu and 95% ram more or less every day for a couple of hours. Talking to the maintainers of the host, fixing this is in the too-hard basket, and they don't really care about the cpu/ram usage. We'd still like to monitor/graph it, but how can we exclude these service notifications on these specific hosts? My thoughts were maybe to have a keyword like "not-cpu" and exclude them from notifications, but there doesn't seem to be any way to do notifications based on keyword exclusion.

Also sticky acknowledgements don't work as the server will go back to green once it has finished what it is doing each day.

Any ideas?

andym_cv's picture
We have done a similar thing

We have done a similar thing with a couple of checks, such as Physical Memory. All we did was remove the warning and critical thresholds from the service check arguments.

The standard arguments for the CPU Utilisation service check are:

-H $HOSTADDRESS$ -c nsc_checkcpu -a 'warn=90 crit=95 time=10m time=1m ShowAll=long'

Make a copy of that service check (or edit the original if you prefer) then remove the warn and crit arguments so it becomes:

-H $HOSTADDRESS$ -c nsc_checkcpu -a 'time=10m time=1m ShowAll=long'

andym_cv's picture
Actually, I expect you want

Actually, I expect you want to leave the normal "CPU Utilisation" service check as it is for your other servers.

Just make a duplicate called "CPU Utilisation no alerts", make the above edits, then assign it to the required servers instead of the normal CPU monitor.

pcmerc's picture
Actually you'd handle this on

Actually you'd handle this on the host config as go can choose to tweak, exclude, & other options relative to a monitor per the host.

So say you have a host added to host template of checks, you'd do the following:

Settings > Hosts > Select the host > Monitors > Drop down the Service Group, Check a + next to the service & off to the right click exception (purple star).

Change the check how you'd like it for that host.

You can also easily exclude by changing the + to a -

Far less work that the solutions previously stated

Need any other help, left me know. I've been working with Opsview / Nagios for 10+ years :D





andym_cv's picture
That's true, overriding or

That's true, overriding or excluding for a single host is simplest that way, but if you have several hosts where you want to do the same thing, maintaining all the individual overrides soon gets tedious.

smarsh's picture
You can do overrides at the

You can do overrides at the host template level, also.