CRITICAL service but Host shows as UP

10 replies [Last post]
d2sv5f5y's picture
d2sv5f5y
User offline. Last seen 4 weeks 2 days ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 07 Dec 2010
Posts:
Points: 745

Hello,

Please see the attached image. It shows a Host which is displaying as UP even though there are two CRITICAL service checks. How does OpsView decide if a Host is UP? as in this case we would like it to show as DOWN!

The Host is a Windows Server running the OpsView Agent.

Please let me know if you need more details.

Thanks,

Mark.

0
Your rating: None
PeterPlate's picture
PeterPlate
User offline. Last seen 1 year 31 weeks ago. Offline
Opsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 02 Mar 2011
Posts:
Points: 250

Mark,

Will try to answer your question as to my knowledge.

The host up means that the host check command is giving an OK response. You should check if you are using the correct host check or if using none to configure one.

If the wanted host check is not in the list be sure to add it to the Advanced > Host Check Commands.

Then the host will depend on that check to give a status on host.

Your rating: None
PeterPlate's picture
PeterPlate
User offline. Last seen 1 year 31 weeks ago. Offline
Opsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 02 Mar 2011
Posts:
Points: 250

Mark,

Will try to answer your question as to my knowledge.

The host up means that the host check command is giving an OK response. You should check if you are using the correct host check or if using none to configure one.

If the wanted host check is not in the list be sure to add it to the Advanced > Host Check Commands.

Then the host will depend on that check to give a status on host.

Your rating: None
smarsh's picture
smarsh
User offline. Last seen 9 hours 33 min ago. Offline
Certified Opsview AdministratorStaff IconOpsview Sensei - 5th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 14 May 2012
Posts:
Points: 165

As said above, a host check simply checks to see if a host is up, i.e. do we get a ping response, SSH response etc from it. Service checks (how much disk space has C:/ drive got free, for example) are seperate entities entirely.

Your rating: None
d2sv5f5y's picture
d2sv5f5y
User offline. Last seen 4 weeks 2 days ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 07 Dec 2010
Posts:
Points: 745

OK I've looked at the "host check command" for the server in question and it's set as NRPE (on port 5666).

So I suppose my next questions is; How does NRPE decide if a host is Up or Down?

 

Your rating: None
awijntje's picture
awijntje
User offline. Last seen 5 weeks 1 day ago. Offline
Certified Opsview AdministratorOpsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 27 Jun 2010
Posts:
Points: 8720

he there,

Going over your screenshot, I have the following question.

Why do you think the host should be DOWN (and not UP)?

That two service-checks are critical mean there is something wrong with those services (for instance a DISK is full).

So what are the service-checks for and what are they monitoring?

hope this helps,

Alan

Your rating: None
d2sv5f5y's picture
d2sv5f5y
User offline. Last seen 4 weeks 2 days ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 07 Dec 2010
Posts:
Points: 745

 

I've just attached a screenshot of the service checks (the file called Capture2.JPG in the top post). At the time of the problem I believe it was the two HTTP checks that were CRITICAL.
 
This is our web server so if HTTP is CRITICAL we would like the host to show as DOWN.
 
Similarly if the C or D drive showed as CRITICAL we would like the the host to show as DOWN. 
Your rating: None
d2sv5f5y's picture
d2sv5f5y
User offline. Last seen 4 weeks 2 days ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 07 Dec 2010
Posts:
Points: 745

 

I've just attached a screenshot of the service checks (the file called Capture2.JPG in the top post). At the time of the problem I believe it was the two HTTP checks that were CRITICAL.
 
This is our web server so if HTTP is CRITICAL we would like the host to show as DOWN.
 
Similarly if the C or D drive showed as CRITICAL we would like the the host to show as DOWN. 
Your rating: None
smarsh's picture
smarsh
User offline. Last seen 9 hours 33 min ago. Offline
Certified Opsview AdministratorStaff IconOpsview Sensei - 5th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 14 May 2012
Posts:
Points: 165

This isnt possible; the host is not DOWN, the host is UP, the HTTP is down, for example. "DOWN" relates to a response via ICMP/NRPE/SSH/FTP for example. This is specified in the host settings, under "HOST CHECK COMMAND". To check APACHE for example, you can change the host check command to "HTTP (80)". 

Your rating: None
d2sv5f5y's picture
d2sv5f5y
User offline. Last seen 4 weeks 2 days ago. Offline
Opsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 07 Dec 2010
Posts:
Points: 745

smarsh, if the "HOST CHECK COMMAND" is set to NRPE what service checks need to go CRITICAL for the host to show as DOWN?

Your rating: None
awijntje's picture
awijntje
User offline. Last seen 5 weeks 1 day ago. Offline
Certified Opsview AdministratorOpsview Enterprise SubscriberOpsview Sensei - 5th DanOpsview Sensei - 4th DanOpsview Sensei - 3rd DanOpsview Sensei - 2nd DanOpsview Sensei - 1st Dan
Joined: 27 Jun 2010
Posts:
Points: 8720

he there,

if NRPE is the host-check, then NRPE has to be down before the host is considered to be down.

Aside from this I would like to point out that you are trying to effectively bypass a very important (and usefull) feature of Opsview (root-cause analysis) and I would strongly urge you to reconsider the "if serviceA fails the host should be down" approach.

A host DOWN is allways used to signify that the host has NO working connection to the outside world/network and most like means something like: power outage, defective network cables or a broken switch or even misconfigured NICs/switches.

A service failure signifies that a application or program is not working correctly but you can still reach the host and remotely fix it.

Another reason not to go with your approach is that when a host is DOWN all checks are suspended untill the host is UP again (meaning you'll miss information that is not gathered during this period causing all kinds of havoc when the host is back UP).

If you need more information on this kind of setups have a look at labs.opsview.com (article on monitoring multi-homed servers) it explains some of the concepts of this (and google around for things like dependencies).

hope this helps,

Alan

Your rating: None
Rq
Rq

Call us for a quote

866·662·4160

International numbers