Extending the Opsview Agent
Opsview Monitor is able to watch many different types of hosts, from networking devices to servers. We provide a package for many servers which can be installed to improve available monitoring and to make available what is happening to the server 'on the inside' rather than just running checks from outside the host against the services the host provides.
The agent we provide comes pre-configured for use with the Host Templates provided, but it has been created in such as way that the agent can be extended to provide more functionality.
This document covers extending the agent; installation of the agent is already covered in our public documentation.
Stock Opsview Agent details
The stock Opsview Agent installs all files under /usr/local/nagios (this may change to /opt/opsview/agent in the future), with all files owned by the user ' nagios', group 'nagios'. Six directories may be created in this area:
- bin - location of the 'nrpe' daemon binary - when running, the daemon listens for connections from the Master or Slave Nodes, runs the plugin and returns the result
- etc - location of the main configuration file 'nrpe.cfg'
- lib - optional directory containing supplementary files for the daemon
- libexec - location of plugins the daemon calls to do the actual work
- perl - supplementary library files used by Perl based plugins
- var - directory used by some plugins to store persistent data
The file etc/nrpe.cfg contains all the configuration that the nrpe daemon needs to run such as which port to listen on, which IP addresses are allowed to contact it, which checks are available to run and how long to run for before the daemon times out and kills them. This file should not be changed as it is overwritten every time the agent package is upgraded.
Adding in a new check
In this example, we will create a check called 'check_test' on a host being monitored which will return an OK status along with the text 'Check ran okay'.
To start with, log into the Opsview Master or Slave Node and, assuming the host being monitored where we will create the check is 192.168.15.218, run:
$ /usr/local/nagios/libexec/check_nrpe -H 192.168.15.218 -c check_test NRPE: Command 'check_test' not defined
To create the plugin, log in to the monitored host as root and create the file /usr/local/nagios/libexec/check_test.pl containing the following:
#!/bin/bash echo "Check ran okay" exit 0
Note: There are full details on how to write checks at https://www.monitoring-plugins.org/doc/guidelines.html
Make this script executable by running:
# chmod +x check_test.pl
And then run it to prove that it works as expected:
# /usr/local/nagios/libexec/check_test.pl Check ran okay # echo $? 0
At this point, we now configure the agent to associate the command 'check_test' with the plugin being executed, 'check_test.pl'. To do this, an additional configuration file is created within the directory /usr/local/nagios/etc/nrpe_local/ (it is safe to create this directory if it does not already exist). Any files in this directory with the suffix '.cfg' will be read after nrpe.cfg (do not make changes to nrpe.cfg as they will be lost when the agent is upgraded).
Create a file '/usr/local/nagios/etc/nrpe_local/local.cfg' containing the line:
The '$ARG1$' is important as this takes any arguments provided by Opsview Monitor and passes them directly to the plugin being called. Without this, the arguments configured in Opsview Monitor will be ignored.
At this point the agent can be restarted; typically by calling:
# /etc/init.d/opsview-agent restart
Note: This depends on your host (e.g. 'service opsview-agent restart' might be more appropriate).
At this point you can now re-run the test on the Opsview Master or Slave Node and prove that the new plugin works as expected:
$ /usr/local/nagios/libexec/check_nrpe -H 192.168.15.218 -c check_test Check ran okay $ echo $? 0
The local.cfg file may also be used to reconfigure the agent. For example, the default timeout for the agent running a plugin is 60 seconds which, for some plugins, may not be long enough. To increase the timeout to 2 minutes you could add the following and restart the agent:
However, be aware that changing some of these settings may prevent the daemon from starting or running correctly.