You are here

Notification Storms / Parent relationships

2 posts / 0 new
Last post
larsen_86022
larsen_86022's picture
Notification Storms / Parent relationships

We monitor hosts and WAN links at serveral sites via VPN connections. We have set our monitored hosts behind the VPN connections to have the WAN device as the parent. Yet when the WAN bounces, we get a storm of notifications from hosts and service checks before we see the notification of the WAN being down. My understanding of the documentation was that when a host check or service check fails, the parent is checked before sending a notification for the child. Thus we should only get a notification on the parent.

Is the documentation wrong? Is the code implementation wrong? Does Opsview really run an on-demand parent check before flagging a service or host as being down? Are we at the mercy of the timing of the checks? (i.e., a host or service check fails, a notification is generated, then later when the parent is scheduled for a check, it is found to be down, then and only then do the child hosts/services get tagged as unreachable).

I have even gone so far as to set my service checks to have Max Attempts of 3 at interval of 2 minutes, hosts checks have Max Attempts of 3 at interval of 1 minute and WAN checks have Max Attempts of 2 at 1 minute. So it should not be possible for a service check to be flagged if a host is unreachable. The lower level checks should take much longer to generate a notification. A host check or parent (WAN) check should fail first.

However, at the moment, I am now sorting through dozens of notifications because of a single momentary failure of a WAN link.

Opsview has such great potential, but when we get flooded with messages because a WAN link went down, it's getting more and more difficult to feel that Opsview has any value to us. We just don't have the budget for a Pro license which would allow us to have a proper master/slave setup.

Thanks, 

Jeff

Pantek
cybrhost's picture
Opsview Core does not respect Parent/Child relationships

We have the same issue. It appears Opsview Core does not respect Parent/Child relationships deeper than one level.

For example in this basic setup:

Opsview Server -> Router 1 -> Switch 1 -> Host 1

If Router 1 is DOWN, it will not alert on Swith c1, however it will still allert on Host 1. Only the direct relationship appears to be respected.

If anyone has a resolution for this please advise us as well.

 

Thank you,

Richard