Monitoring Apache Solr with Opsview
A check list for service checks
Solr is built on Lucene so follows the same layout, an index contains documents that are comprised of fields. As part of the search service value add over Lucene, Solr provides a number of useful ways of obtaining health status / monitoring metrics:
- Health-check status using the /admin/ping handler
- The admin statistics page /admin/stats.jsp (XML styled with XSL)
- JMX MBeans
The list of applicable checks could be defined by whether it is a health check or a data gathering check – but this would lead to a lot of overlap. Instead the list is divided into the checks that can be performed remotely (without an installed agent on the server) and those that are best performed locally to the Solr server.
Remote (agent-less) checks
What should we look for over the network? Firstly we can have a host-level check which may perform a network level ping. Next we can check TCP connectivity to the servlet container port and then make an HTTP GET request to the Solr ‘front page’ and check for a known string (e.g. Welcome to Solr). Now we’ve made it up to the application layer so can start to perform Solr specific checks. Items to monitor may include (delete as applicable):
- Ping status
- Number of docs
- Number of queries / queries per second
- Average response time
- Number of updates
- Cache hit ratios
- Replication status
- Synthetic queries
Installing an Opsview agent on the Solr server means we can run additional checks over NRPE (Nagios Remote Plugin Executor). This could be operating system level checks such as memory/disk utilisation or CPU load, or the following:
- Java servlet container process is running
- JMX checks e.g. heap memory or custom MBeans
- File age
- Log parsing for exceptions
The Solr wiki describes how to configure JMX support: http://wiki.apache.org/solr/SolrJmx.
Install the Solr plugin at https://github.com/rbramley/Opsview-solr-checks into /usr/local/nagios/libexec/ The check_solr plugin was developed using Perl, so that it could be contributed back to Opsview. It requires the CPAN XML::XPath module (sudo cpan -i XML::XPath). The plugin includes usage instructions, check_solr -h which can also be viewed in Opsview by selecting the ‘Show Plugin Help‘ link beneath the Plugin drop down (see Figure 1). The -u option can be used to specify the URL path for multiple set-ups.
Service check setup
Figure 1 gives an example of a service check configuration.
Figure 2 shows the agentless service check group with plugins and their arguments.
Figure 3 shows a simplistic host setup with a ping check.
Figure 4 is an extract from the Monitors tab, where we select the checks we want performed for the current host.
The check results shown in Figure 5 are visible by navigating through the host group hierarchy.
If you click on the graph icon of Solr Cache Hit Ratios this will drill down onto the graph shown in Figure 6. Clicking on the graph icon for Solr Avg Response Time – standard will take you to the graphs in Figure 7.
There are a few other plugins available for monitoring Solr from Opsview, depending on your needs:
- http://code.google.com/p/nagios-plugins-shamil – provides ping, replication status and num docs
- http://code.google.com/p/solr-nagios-check – provides QPS, response time and num docs
Also, chapter 8 of the recently published Apache Solr 3 Enterprise Search Server book includes a section on Monitoring Solr Performance.
Using check_solr in conjunction with Opsview allows you to ensure that your Solr server is available and provides you with metrics that can help you tune your Solr configuration. This can be complemented with additional agent-based operating system and JMX checks to give you a full picture view.