You are here

How to write a simple service check to see if a process is running.

6 posts / 0 new
Last post
sdmike
sdmike's picture
How to write a simple service check to see if a process is running.

I have a snort box at my work and all I want to do is to check to see if it's running. This is the service check I have: check_nrpe -H $HOSTADDRESS$ -c check_procs -a '-C snort -w 1: -c 1:' Opsview alerts me that "PROCS WARNING: 129 processes" but all I want it to do is alert me when snort goes down. What am I missing here? Cheers!

dstein
dstein's picture
Re: How to write a simple service check to see if a process ...

On the face of it, your syntax looks correct to me.  

This makes me wonder whether or not what is running is actually what you THINK is running.  Try running (as nagios) /usr/local/nagios/utils/get_actual_command <host>.  This will dump the fully substituted command lines that are actually getting run for each service check on that host.  Find your snort check and confirm that it is exactly as shown below, with proper appearance of single quotes, spacing, your host name, etc.  

/usr/local/nagios/libexec/check_nrpe -H <your_host> -c check_procs -a '-C snort -w 1: -c 1:'

Those are single quotes in my example.  

sdmike
sdmike's picture
Re: How to write a simple service check to see if a process ...

This is was the output:

PROCS OK: 1 process with command name 'snort'

dstein
dstein's picture
Re: How to write a simple service check to see if a process ...

Your output in comment #2 looks correct - the result shows you the status of the one service you cared about.  However, I assume that since you didn't say "problem solved" that you probably ran check_nrpe manually and got this correct result, but you still aren't getting the same thing when Opsview runs the "same" [sic] test and you look at the GUI?  

If so, look again at the suggestion in comment #1 about using the get_actual_command utility.  If your manual test from the command line works but the automated test doesn't, Opsview isn't submitting the same command line you typed.  Use get_actual_command to review what it is *really* running and you should be able to see the discrepency and use that to fix up the service definition.

sdmike
sdmike's picture
Re: How to write a simple service check to see if a process ...

I ran the command and I recieved this message:

Class::C3::Componentised::load_components(): Use of DBIx::Class::UTF8Columns is strongly discouraged. See documentation of DBIx::Class::UTF8Columns for more info
 

I have 35 checks running on that system.

dstein
dstein's picture
Re: How to write a simple service check to see if a process ...

Sounds like you ran the command as a user other than 'nagios'.  However, even if you ran it as root it should have still printed a list of all the command lines that get executed for the 35 checks you're running.  At least, that's what happens on my system.  If it didn't on yours,  execute "su - nagios" and re-run get_actual_command.  

Find the command line that invokes check_nrpe for your snort test and compare what printed out with the version in comment #1.  They should be identical (once you put your hostname into the version shown into comment #1).  If they aren't then you'll need to update the service check in Opsview until the result printed by get_actual_command is correct.

The bottom line is that you've proven to yourself that you can manually execute a check_nrpe command that gives you the desired result.  Now you simply have to determine how the syntax of the command Opsview is running differs and correct your service check.

Topic locked