You are here

Opsview Core v3.20131016.0 somewhat broken after system failure - Runtime Database not updating for new services

3 posts / 0 new
Last post
jplantinga
jplantinga's picture
Opsview Core v3.20131016.0 somewhat broken after system failure - Runtime Database not updating for new services

Any help would be appreciated.  I am running Opsview Core 3.2 on CentOS 6.  After a power failure took down my host on Dec. 24th, 2015, any new hosts or monitored services within those hosts added have not showed up in the runtime database.  They show up in the gui and are being monitored and alerts sent, etc. when they go out.  Unfortunately, I don't have DB backups from before that time as this issue was not detected util recently.  Are there ways to rebuild the 'runtime' database?  This issue presents itself when I try to view performance graphs or enable/disable individual services on newly created monitord services.  If I manually insert the missing objects into the runtime database, they can be fixed so they can be managed, but adding new ones via the gui does not work.  I enabled mysql db error logging but no errors appear.  As an interesting data point, the newly created (somewhat defective) items that were not in the runtime database had entries on /usr/loca/nagios/var/rrd that were owned by 'root' rather than 'nagios' (like all the other good working ones) leading me to think the opsview-web is maybe running as root rather than nagios??

Any assistance would be appreciated.  Cheers.

Duncan Ferguson
dferguson's picture
Yes, you can rebuild the

Yes, you can rebuild the runtime database - as nagios run 'db_runtime db_install'

The Runtime database powers the UI, so if the UI is udpating then something must be getting into the database.

If you need to check permissions then start out with 'rpm -qV opsview ospview-base opsview-web opsview-core opsview-perl' and check all lines returned (see the verify section of the RPM man page to see the entries', otherwise run 'find /usr/local/nagios ! -user nagios' but be aware of files should be owned by root, group nagios and be setuid.

Are there any files in nagios/var/ndologs?  It may be these are not being imported correctly into the database by import_ndologsd - also check /var/log/opsview/opsviewd.log for errors

  Duncs

jplantinga
jplantinga's picture
Thank you for the reply! 

Thank you for the reply! 

I ran the command to rebuild the runtime database but I see that is has essentially emptied the database so the issue I had with a few newly added monitors now extends to all my monitored services.  Is there a subsequent command that re-populates the database based upon the config files?  ( I have a DB backup from just before that command was run so I can bring it back to its partially broken state too.)

I ran the permission check..   there were some noted changes (I removed many obvious entries from the list such as log and config files that would have naturally changed size/date).

.M.......    /usr/local/nagios/bin
.M....G..    /usr/local/nagios/etc
.....UG..    /usr/local/nagios/installer
.M.......    /usr/local/nagios/lib
.M.......    /usr/local/nagios/libexec
.M.......    /usr/local/nagios/sbin
.M.......    /usr/local/nagios/share
.M.......    /usr/local/nagios/share/images
.M.......    /usr/local/nagios/share/images/logos
.M...UG..    /usr/local/nagios/share/media
.M.......    /usr/local/nagios/share/stylesheets
.M.......    /usr/local/nagios/snmp
.M.......    /usr/local/nagios/snmp/all
.M.......    /usr/local/nagios/snmp/load
.M.......    /usr/local/nagios/var
.M.......    /usr/local/nagios/var/spool

As for finding non-nagios user owned files,

/usr/local/nagios/bin/snmpd
/usr/local/nagios/bin/install_slave
/usr/local/nagios/etc/ilo
/usr/local/nagios/etc/ilo/ilo.cfg
/usr/local/nagios/etc/objects/check_hyperv_perf.cfg
/usr/local/nagios/etc/nrpe_local/override.cfg
/usr/local/nagios/installer
/usr/local/nagios/installer/preremove
/usr/local/nagios/installer/postinstall_root
/usr/local/nagios/nagvis/var/header-default-cache
/usr/local/nagios/nagvis/var/automap.png
/usr/local/nagios/nagvis/var/automap.dot
/usr/local/nagios/nagvis/var/__automap.cfg-1.4.4-cache
/usr/local/nagios/nagvis/var/nagvis.ini.php-1.4.4-cache
/usr/local/nagios/nagvis/var/opsview.cfg-1.4.4-cache
/usr/local/nagios/nagvis/var/hover-default-cache
/usr/local/nagios/nagvis/var/context-default-cache
/usr/local/nagios/share/media
/usr/local/nagios/share/faviconCommunity.ico
/usr/local/nagios/share/faviconEnterprise.ico
/usr/local/nagios/var/log/DBVersion.log
/usr/local/nagios/var/ndologs/1452407120.177912
/usr/local/nagios/var/ndologs/1452407115.173795
/usr/local/nagios/var/ndologs/1452407130.265327
/usr/local/nagios/libexec/hpiLO_nagios_config
/usr/local/nagios/libexec/check_icmp
/usr/local/nagios/libexec/index.html
/usr/local/nagios/libexec/nagios_hpilo_traps
/usr/local/nagios/libexec/index.html.1
/usr/local/nagios/libexec/nagios_hpilo_cfg_generator
/usr/local/nagios/libexec/check_dhcp
/usr/local/nagios/libexec/hpiLO_nagios_config.cfg
/usr/local/nagios/perl/bin
/usr/local/nagios/perl/lib
/usr/local/nagios/perl/man
/usr/local/nagios/include

of these, only check_dhcp and check_icmp were setuid.  (There were many image files in /share/images that I chown'ed over to nagios:nagios to be consistent with the other files in that folder.  I didn't think changing these would matter much.)

There are many hundreds of files in /ndologs dating from current time going back a couple of months.

As for errors in opsviewd.log, there are only WARN errors from import_ndologsd saying the import is taking longer than 5 seconds.  (For a short period, there were some FATAL errors when the 'db_runtime db_install' was taking place where there was no db/table to insert into but that's understandable.)  I guess stopping opsview/opsview-web during the rebuild would have prevented that...

Previously, I looked in the opsviewd.log file for errors when adding new hosts or services - expecting to find a db insert failure but alas, nothing.

Thanks again for the assistance.