In our environment we are working towards a situation where we have no operators anymore.
I have discovered one drawback when monitoring logfiles, no matter what plugin I use.
Picture the following situation : In a logfile the error OutOfMemory occurs. This message occurs only once, and will not be met with an accompanying message saying something like: All OK now.
The error will be picked up by any check_logfile plugin I've tried, but these plugins remember where they left off reading the logfile, resulting in the fact that the next check will start from a point after the error, will not find another error, so the check returns OK. By that time a notification will have gone out to a system engineer, but if he/she misses this notification, a subsequent notification will not occur since no more errors were found in the logfile.
It is possible with some plugins to make the error sticky, but even then there's no way to kill the alert in Opsview.
What we like to achieve is the following :
1) check_logfile plugin monitors a certain logfile
2) plugin detects an error, and sends a notification
3) plugin goes on monitoring, but leaves the error found in a critical state, showing red in Opsview.
4) After a solution has been implemented, the alert in Opsview can be closed by the system operator / engineer.
I've tried closing a critical that had been made sticky in the check_logfile plugin config by submitting a passive result for this check, but after a short while the same critical appears again....
FYI, In our current monitoring system (Tivoli Monitoring) this is possible.
Can this be done at all in Opsview ?
If yes, how ?
If no, why not ?