We monitor hosts and WAN links at serveral sites via VPN connections. We have set our monitored hosts behind the VPN connections to have the WAN device as the parent. Yet when the WAN bounces, we get a storm of notifications from hosts and service checks before we see the notification of the WAN being down. My understanding of the documentation was that when a host check or service check fails, the parent is checked before sending a notification for the child. Thus we should only get a notification on the parent.
Is the documentation wrong? Is the code implementation wrong? Does Opsview really run an on-demand parent check before flagging a service or host as being down? Are we at the mercy of the timing of the checks? (i.e., a host or service check fails, a notification is generated, then later when the parent is scheduled for a check, it is found to be down, then and only then do the child hosts/services get tagged as unreachable).
I have even gone so far as to set my service checks to have Max Attempts of 3 at interval of 2 minutes, hosts checks have Max Attempts of 3 at interval of 1 minute and WAN checks have Max Attempts of 2 at 1 minute. So it should not be possible for a service check to be flagged if a host is unreachable. The lower level checks should take much longer to generate a notification. A host check or parent (WAN) check should fail first.
However, at the moment, I am now sorting through dozens of notifications because of a single momentary failure of a WAN link.
Opsview has such great potential, but when we get flooded with messages because a WAN link went down, it's getting more and more difficult to feel that Opsview has any value to us. We just don't have the budget for a Pro license which would allow us to have a proper master/slave setup.