Cluster monitoring with Nagios

Mission: We have a cluster of 2 devices. From the Nagios server network, we can’t ping both devices at the same time. It will be possible to ping one device for a while and impossible to ping the other one until the contrary happens.
An alert will appear in Nagios if one of the device becomes unreachable but it’s going to be a false alarm as the other device is reachable and the cluster healthy…
We want an alarm only if both devices are unreachable.

To do this I used a Nagios plugins called check_multiaddr.
I found it there:

How I deployed it:
– I uploaded the file to my Nagios server
– I copied it into my nagios plugins directory (/usr/lib/nagios/plugins/)
– I made it executable:

#chmod +x

– I tested the script:
I want to test the ping service (Example with options already set: check_ping -H $HOSTADDRESS$ -w 800.0,20% -c 999.0,60% -p 5)
My devices IP addresses are and
So, here is the command I executed:

#./ /usr/lib/nagios/plugins/check_ping -H,10.2.0
.4 -w 800.0,20% -c 999.0,60% -p 5

– I got a timeout error:

Timeout detected (9s - you can edit its duration in ./

– I edited the file and changed the TIMEOUT value from 9 to 15 seconds:

my $TIMEOUT = 20;

– I executed the same command and this time it worked: PING OK - Packet loss = 0%, RTA = 0.97 ms|rta=0.970000ms;800.000000;999.000000;0.000000 pl=0%;20;60;0

– Now I had to change the Nagios configuration. I added a new host to represent my cluster. And I typed the 2 IP addresses separated by a comma (where I usually enter 1 IP address)
– I also created 2 new commands based on my existing commands: one to check if the host is alive and the standard ping service.
So, a command which was before: $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
became: $USER1$/ $USER1$/check_ping -H $HOSTADDRESS$ -w 3000.0,80% -c 5000.0,100% -p 1
– And of course I assigned this commands to the new host check command and to the ping service for this host.
– Then I reloaded Nagios to make it apply the new configuration
– At first I got timeout errors (again?):

(Host Check Timed Out)

But this time it wasn’t because of the script setting. I had to change the Host Check Timeout in Nagios main configuration file. I changed the value to 15 (it was set to 10). If you’re using Centreon to set the Nagios configuration, the parameter is in Nagios -> nagios.cfg > Logs Options
– I reloaded again and it worked :-)

Again, don’t hesitate to write your questions/comments in French, Portuguese, Italian, Spanish or Romanian.

Cluster monitoring with Nagios

2 thoughts on “Cluster monitoring with Nagios

  1. unknown says:

    Is it possible to priortize the IP addresses the way we like? I mean is there a way to check x number of ip gets check first over y? And if x fails y would reply.

    1. No I don’t think. It just checks the 2 IPs the same way and sends back the best state of the IP checks (if one IP is OK and the other is CRITICAL => sends OK, if one IP is WARNING and the other CRITICAL => sends WARNING). I don’t think you can ask to check one IP more than the other…

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s