Cluster monitoring with Nagios

Mission: We have a cluster of 2 devices. From the Nagios server network, we can’t ping both devices at the same time. It will be possible to ping one device for a while and impossible to ping the other one until the contrary happens.
An alert will appear in Nagios if one of the device becomes unreachable but it’s going to be a false alarm as the other device is reachable and the cluster healthy…
We want an alarm only if both devices are unreachable.

Solution:
To do this I used a Nagios plugins called check_multiaddr.
I found it there: http://exchange.nagios.org/directory/Plugins/Others/check_multiaddr/details

How I deployed it:
– I uploaded the file check_multiaddr.pl to my Nagios server
– I copied it into my nagios plugins directory (/usr/lib/nagios/plugins/)
– I made it executable:

#chmod +x check_multiaddr.pl

– I tested the script:
I want to test the ping service (Example with options already set: check_ping -H $HOSTADDRESS$ -w 800.0,20% -c 999.0,60% -p 5)
My devices IP addresses are 10.2.0.2 and 10.2.0.4
So, here is the command I executed:

#./check_multiaddr.pl /usr/lib/nagios/plugins/check_ping -H 10.2.0.2,10.2.0
.4 -w 800.0,20% -c 999.0,60% -p 5

– I got a timeout error:

Timeout detected (9s - you can edit its duration in ./check_multiaddr.pl).

– I edited the check_multiaddr.pl file and changed the TIMEOUT value from 9 to 15 seconds:

my $TIMEOUT = 20;

– I executed the same command and this time it worked:

10.2.0.2: PING OK - Packet loss = 0%, RTA = 0.97 ms|rta=0.970000ms;800.000000;999.000000;0.000000 pl=0%;20;60;0

– Now I had to change the Nagios configuration. I added a new host to represent my cluster. And I typed the 2 IP addresses separated by a comma (where I usually enter 1 IP address)
– I also created 2 new commands based on my existing commands: one to check if the host is alive and the standard ping service.
So, a command which was before: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
became: $USER1$/check_multiaddr.pl $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
– And of course I assigned this commands to the new host check command and to the ping service for this host.
– Then I reloaded Nagios to make it apply the new configuration
– At first I got timeout errors (again?):

(Host Check Timed Out)

But this time it wasn’t because of the script setting. I had to change the Host Check Timeout in Nagios main configuration file. I changed the value to 15 (it was set to 10). If you’re using Centreon to set the Nagios configuration, the parameter is in Nagios -> nagios.cfg > Logs Options
– I reloaded again and it worked :-)

Again, don’t hesitate to write your questions/comments in French, Portuguese, Italian, Spanish or Romanian.

Cluster monitoring with Nagios

Add an user access to Nagios web interface

Mission: Give access to the Nagios web interface.

First, create a new user.
On the Nagios server:

# htpasswd /etc/nagios3/htpasswd.users newusername

htpasswd.users might be located somewhere else (you can use the locate command to find it) and you might also use htpasswd2 instead of htpasswd.

– I’m not sure it’s necessary but I did it:

/etc/init.d/apache2 reload

Then you have 2 options:

  • Option #1: you allow your user to view only specific hosts. In this case, if you enabled external commands in the Nagios configuration, your user will also be allowed to execute commands through the Nagios interface.
  • Option #2: you allow your user to view all hosts. In this case you can unauthorized the execution of external commands for this user.

You can find explanations in the official documentation : http://nagios.sourceforge.net/docs/3_0/cgiauth.html

Implementation of the Option #1:
– Create a contact user with the same username than the one you used in the htpasswd command
– Add the contact for each Host you want him to access (don’t need to add the contact at the service level, if it is a contact for a host, he can also view all the services associated to this host)
– Reload Nagios

Implementation of the Option #2:
– Modify the cgi.cfg file to give the user (use the same name as the one you used in the htpasswd command) the rights of :

  • System/Process Information Access
  • Global Host Information Access
  • Global Service Information Access

– Reload Nagios

References:
http://nagios.sourceforge.net/docs/3_0/cgiauth.html
http://www.linuxquestions.org/questions/linux-newbie-8/how-to-create-another-user-for-nagios-web-interface-607353/
http://linuxsysadminblog.com/2009/05/setup-nagios-user-to-view-specific-host-and-services/

Don’t hesitate to write your comments in French, Portuguese, Spanish, Italian or Romanian!

Add an user access to Nagios web interface