The latest release of our free monitoring tool - Opsview Core - is now available to download.
This release has had three major objectives:
- Upgrading Opsview's core engine to run Nagios 4
- Significantly speed up reload times
- Completely rewrite the database import component, NDOutils
The key areas of development are highlighted in this overview diagram.
Core engine - Reduced load average by 60%
Reload times - Reduced by 50%
Database import - Reduced by 40%
The Opsview reload time includes time taken to generate the configuration, validate it and send it to all slave systems. As the configuration generation time is usually the longest part, we've focused our efforts there.
Using the amazing Perl profiling tool NYTProf, we were able to identify the specific routines that were taking the most time. This shows that the calculate_parents routine was taking 116 seconds.
After optimizing this function by doing one large lookup and caching results, we were able to reduce it down to 16.5 seconds.
Along with other optimizations, our testing shows this has produced an average 30% improvement to the overall reload time, going up to 50% for some systems!
Upgrading Opsview Core's Engine
We've been working with the latest Nagios 4 beta-release code since December and have been busy integrating it into Opsview.
The Nagios developers have done a great job on identifying the bottlenecks and finding innovative solutions to making it faster and reducing the overhead of monitoring. However, as part of our engineering effort, we have fixed some major bugs which we've pushed back upstream.
Another improvement is around environment macros, which were completely removed in the beta-release code. As we recommend using environment macros (as there is no complication with shell expansion), we've worked with the Nagios developers to define how it should work and provided them with a new feature on a per command basis.
We're run some performance testing comparing the new and old versions. Before the upgrade shows the load 5 average as 1.181. After the upgrade, the load 5 average is down to only 0.455:
NDOutils is a project designed to provide realtime status information in a database. We've historically been amending the base code to add in features that we need, but this has been getting harder and harder to do due to its C heritage.
So we've taken the challenge of rewriting this component in Perl. Taking inspiration from a prototype written by Alan Wijntje, we've written a newer version with the following objectives:
- Use one of Perl's crown jewels, DBI, for database interaction
- Removing MySql specific SQL statements
- Adding unit tests to prove the database is updated appropriately
- Get speed and efficiency improvements
One major issue is a MySQL limitation in the use of INSERT … ON DUPLICATE KEY UPDATE statements which were incrementing ids unnecessarily. We've overcome this limitation through the use of separate UPDATE and INSERT statements.
We're pleased to say that we can now process 160MB megabytes of data (about 2 hours worth on a medium sized system) in 108 seconds, which is over 40% faster than NDOutils.
Time taken to import 2 hours of NDO Logs (seconds)
We think our engineering team have done a fantastic job!