Service check goes into UNKNOWN state immediately

1 reply [Last post]
cabal95's picture
cabal95
User offline. Last seen 1 year 3 weeks ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 03 Sep 2012
Posts:
Points: 1245

I am trying to monitor a couple Panasonic Projectors that use the PJLink protocol.  The projectors lose network connection a couple times a day for about 30 seconds (this isn't a network issue but an issue with the device itself). This hasn't been a problem until one of them has gone into Critical mode (needs a new bulb) and has sat in that mode for a few weeks (we are waiting for the new year to buy a new bulb).  We acknowledged the Critical alert but that is where we started noticing the problem.

It seems that while it takes X number of times for a service check to go into "notified" critical mode, because the projectors lose connection randomly the service check occasionally goes into UNKNOWN state as soon as one check can't contact the projector. Then when it "recovers" (i.e. successfully contacts the projector and sees that it is still in CRITICAL state) I get another alert and the acknowledgment goes away.

I suppose technically this is correct, but is there a way for me to work around that for these devices that have poor network connectivity? Would I would love is for a "soft UNKNOWN" state so that it doesn't clear acknowledgements until it has been in that unknown state for X number of checks. Kind of like the way something can go CRITICAL but I don't hear about it until it has stayed critical for X number of checks.

I know the "correct" solution is to fix the device, but since it is an internal issue I don't think we can do anything about it.  I have looked and can't find any firmware updates (or even a way to update the firmware) so I think we are stuck with them.

I can acknowledge the item as sticky, but that requires me to remember to "unsticky" it when we do change the bulb.  I can also schedule downtime, but since we don't know when the bulb will actually be replaced just yet I don't have a definite period of time to use.

Any suggestions would be appreciated, thanks!

0
Your rating: None
awijntje's picture
awijntje
User offline. Last seen 4 weeks 5 days ago. Offline
Certified Opsview AdministratorOpsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 27 Jun 2010
Posts:
Points: 8690

he there,

I believe sticky only works for NON-OK states, so if the device fully recovers (ie: goes to OK) only then will the sticky be removed.

from the docs-site (see: http://docs.opsview.com/doku.php?id=opsview-core:acknowledgements)

"With a normal acknowledgement, when a host or service changes state, the acknowledgement is cleared. With a sticky acknowledgement, only when the host or service returns to an UP or OK state will the acknowledgement be cleared."

So in this case a sticky would solve your issue (as I assume the device goes to OK/UP when your replace the bulb).

hope this helps,

Alan

Your rating: None
Rq
Rq

Call us for a quote

866·662·4160

International numbers