Whitepaper

Opsview Monitor Master to Slave Node Communication

Opsview Monitor has many daemons running on both the Maser and Slave Node servers to allow for efficient monitoring of hosts. Some of these daemons need to communicate between the Slave Nodes and the Master to report status information or to action tasks, such as running ad-hoc checks, acknowledging problems or applying downtime.

To improve security and aid with administration, SSH is used to open and maintain a single communication channel between the Master and each of the Slave Nodes; all Opsview Monitor daemons then use this tunnel to pass information and data. SSH encrypts all traffic between the servers to prevent snooping of data and to prevent other systems or users accessing unnecessary information.

Default Configuration

By default, the Master opens up an SSH tunnel on port 22 to each slave node and starts a process to monitor the connection. This tunnel also opens up the follow ports

Port 5669 is set up to listen on the Slave Node and redirect to port 5669 on the Master
Port 4125 is set up to listen on the Slave Node and redirect to port 4125 on the Master
Port 5667 is set up to listen on the Slave Node and redirect to port 5667 on the Master
Port 2345 is set up to listen on the Slave Node and redirect to port 2345 on the Master
Port 5700 is set up to listen on the Slave Node and redirect to port 5700 on the Master
Port 25800+X is set up to listen on the Master and redirect to port 5666 on the Slave Node

Each slave node is assigned a unique port number starting from 25800 - this port is opened on on the master and redirected to port 5666 on each Slave Node to allow the Master to send specific communications to each Slave Node Agent process. 

To allow the port forwarding to be started, the sshd_config file configuration option AllowTcpForwarding must be set to 'yes' (this is the default).

To improve the reliability of the SSH tunnel, the following options are used (these options are fully documented in the SSH man pages (within pages for ssh and ssh_config): 

  • -n: redirect STDIN from /dev/null
  • -T: disable pseudo-tty allocation.
  • -2: Forces ssh to try protocol version 2 only
  • -o TCPKeepAlive=yes: (depending on OS) use TCP keepalive messages to keep the tunnel active
  • -o ServerAliveCountMax=3: (depending on OS) Maximum number of keepalive messages sent to the remote side without response before disconnecting
  • -o ServerAliveInterval=10: (depending on OS) Interval in seconds on sending keepalive messages to the remote side when no data has been received
  • -o ExitOnForwardFailure=yes: (depending on SSH version) close the connection if the port forwarding fails

The Master will monitor the SSH tunnel and if any communication failures are picked up the current connection is closed (if necessary) and a new connection initiated.

In order for the SSH tunnels to work seamlessly, nagios user SSH keys must be generated on the Master server and the public key transferred to each Slave Node and added to the nagios user 'authorized_keys' file. This allows for the tunnels to be opened without having to enter a password on each tunnel connection. As a resut, the nagios user on the slaves do not need a password to be set (the way the nagios user account password is locked is critical - some locking methods disable the use of cron for the user, which Opsview relies on for both the Master and Slave Nodes to run scheduled jobs). 

Reverse Configuration

Reverse SSH tunnels can be set up on a per Slave Node basis - the difference from the default configuration (above) is the Slave Node initiates the SSH tunnel to the Master rather than the other way around.

To maintain the SSH tunnels in this configuration, autossh is used on the Slave Nodes to open and maintain the tunnels. The forwarded ports are exactly the same as in the default configuration, above, except for:

Port 25800+X is set up to listen on the Master and redirect to port 22 on the Slave Node

This is to allow the Master to send commands (such as apply downtime, or acknowledgment) on the slave system.

In this configuration, the nagios user SSH keys must be generated on each Slave Node, the public keys transferred to the Master and then added to the nagios user 'authorized_keys' file.

Port Information

The ports redirected over the SSH tunnel are used as follows:

  • 5667/tcp - NSCA daemon - Legacy method used for transferring check result information from the slave to the master - still used by some customers
  • 5669/tcp - NRD daemon - Method used for transferring Service and Host check result information from the slave to the master
  • 2345/tcp - NMIS daemon - Legacy method used for gathering detailed information about networking devices and sending to the Master
  • 5700/tcp - opsviewhd - Method used for transferring NetFlow traffic data to the Master
  • 4125/tcp - activemq - Opsview Messaging Service used for stateless communication between some Opsview Monitor daemons (such as NetAudit)
  • 25800+X/tcp - Opsview Agent - used (in default configuration predominantly) for checking the health of the SSH tunnel by the Master checking the Opsview Agent on the Slave Node responds correctly