How Our Plugin Wrapper Helps You Deal With Long Running Service Checks

dchatterton's picture

Why would you use the Plugin Wrapper?

Long running service checks cause a great deal of frustration, especially when using long database queries, custom scripts or large directories that take a while to check. 

By default, Opsview plugins only have 60 seconds to run before they timeout. This prevents service checks from running too long and using too many resources, but what if you want to run a check that takes two minutes, or fifteen, or even more?  

As a way of preventing your long running service checks from timing out, our Plugin Wrapper collects all data from the service check, then submits it via a passive check to Opsview. The plugin wrapper can be run for checks that go past the standard timeout’s of NRPE, which can run once a day or even every hour. 

How is this different from a Plugin Check?

A Plugin Check in Opsview will be run by default every five minutes and will timeout after 60 seconds. However, the Plugin Wrapper collects all the data when it has run, then passes it to an Opsview passive check without being limited by a timeout.

How to configure the Plugin Wrapper?

The Plugin Wrapper script can be downloaded here.

To use this script, simply add it to your /usr/local/nagios/libexec/ folder and make it executable and owned by the Nagios user.


Plugin Wrapper

Next, in the Opsview interface, navigate to Settings > Service Checks and select “Passive Check” from the “Add New” drop down menu. Here, name the service check and fill out any further details you would like the check to include. Keep a note of the service check name you choose as you will need to reference it later when running the command. Once complete, submit your changes. For more information on passive check configuration, please see these documents.

Plugin Wrapper Service Check

Now that the service check has been named, you can add it to the desired host and reload Opsview.

Back in the command line, you can create a command and test it to see the results in Opsview. As an example, we will use the check_http plugin, but you will use a longer running check with the wrapper. Here is an example command:

./plugin_wrapper -p check_http -a '-H -p 80' -s 'Plugin Example' -H localhost

The first argument, -p, will specify the plugin you wish to use. Next, use -a to specify all the arguments you want to pass along to the plugin. Be sure to keep in mind that these will need to be surrounded with single quotes. -s will allow you to specify the name of the passive service check you previously created. If you have a space in the name, this will also have to be surrounded by single quotes. Lastly, you will need to specify the host in which this check is running on. If you are running it for the first time, or are uncertain of the results it is giving, you can add the -d argument to see the debugging information. This will show you the command that is being run as well as the output and exit code.

After you run that command, you will be able to see the results in Opsview. If the command is running as expected, you can then add it to the Nagios crontab to run regularly. For example, if I wanted this to run every hour, my Cron file would have an entry that looked like this:

1 * * * *
/usr/local/nagios/libexec/plugin_wrapper -p check_http -a '-H -p 80' -s 'Plugin Example'
-H localhost

For more details about Cron, this Wikipedia page contains lots of helpful information.

Use cases:

A good situation to use the Plugin Wrapper would be for a long running database query. Let's say that the example query takes 10 minutes to run and you have already added it to Cron like this:

1 * * * *
/usr/local/nagios/libexec/plugin_wrapper -p check_mysql_query -a 'sql query,
credentials and other arguments' -s 'query example' -H localhost

The command above would allow for the check to run in the background and only pass along data once it is collected. You can set the Cron job to run as often as you would like if the frequency in which it runs is more than the time it takes to run. If you have the command run more frequently, multiple instances of the process will start and could cause performance issues on the system or even cause the system to crash.

Since this script submits results directly to the Nagios pipe, it can only be run on the master or slave server and not a remote system.