You are here

How do I close events manually ?

3 posts / 0 new
Last post
j.j.d.e.lammerts
j.j.d.e.lammerts's picture
How do I close events manually ?

Hi,

In our environment we are working towards a situation where we have no operators anymore.

I have discovered one drawback when monitoring logfiles, no matter what plugin I use.

Picture the following situation : In a logfile the error OutOfMemory occurs. This message occurs only once, and will not be met with an accompanying message saying something like: All OK now.

The error will be picked up by any check_logfile plugin I've tried, but these plugins remember where they left off reading the logfile, resulting in the fact that the next check will start from a point after the error, will not find another error, so the check returns OK. By that time a notification will have gone out to a system engineer, but if he/she misses this notification, a subsequent notification will not occur since no more errors were found in the logfile.

It is possible with some plugins to make the error sticky, but even then there's no way to kill the alert in Opsview.

What we like to achieve is the following :

1) check_logfile plugin monitors a certain logfile

2) plugin detects an error, and sends a notification

3) plugin goes on monitoring, but leaves the error found in a critical state, showing red in Opsview.

4) After a solution has been implemented, the alert in Opsview can be closed by the system operator / engineer.

I've tried closing a critical that had been made sticky in the check_logfile plugin config by submitting a passive result for this check, but after a short while the same critical appears again....

FYI, In our current monitoring system (Tivoli Monitoring) this is possible.

Can this be done at all in Opsview ?

If yes, how ?

If no, why not ?

 

Regards,

Hans

smarsh
smarsh's picture
Hi Hans,

Hi Hans,

Interesting scenario  - i think you are blurring the lines a little. Opsview is not a syslog/log monitoring tool and has never pretended to be. It sounds like your requirement is very specific. You want to look for a word, set something critical, then you want it to stay critical until an administrator manually changes it back to 'OK', is that correct?

If so, why not create a passive check using NSCA. Have the daemon sit on the same box with the logs, have it look for the string, when it finds it, send a criticla error into Opsview, and then have an Administrator log in and send a passive 'clear / OK' message into Opsview. Thats all i can think of at the moment.

Sam

j.j.d.e.lammerts
j.j.d.e.lammerts's picture
Hi Sam,

Hi Sam,

Thanks for the answer.

Yes, maybe our scenario is a bit specific, and yes I know I could be asking something that is not Opsview specific.

Also yes, we just figured out yesterday we could do this using a passive check. This is exactly what we were looking for. Glad we're having a consensus on this one !

Thanks again.

Hans