Tuesday 27 October 2015

Add event handler in nagios


It'll be cool if nagios is going to handle some of your tasks while an incident happens on your application servers. Nagios has feature of eventhandler. Let say nagios send a warning/critical alert for high memory on an application server and you need a stats from logs like number of request per second in last 10 mins. Network transactions on api server in last 10 mins. Need to restart a service and send a mail regarding the same. Kind of any task after triggering an alert or state of change in nagios service.

You need below thing to be pre-configured in your environment
  • Nagios monitoring with Nrpe
  • Nrpe client compiled with –enable-command-args
  • Nagios Services would be configured with Nrpe
  • Shell Script for task needs to be done after an incident/alert received 

I expect nagios 3.0 is running in your environment which monitors your hosts and services
This example is set with client server is on ubuntu and nagios 4.1.1 compiled version is monitoring

To allow nrpe to add arguments compile Nrpe with –enable-command-args on client/application server
#tar -zxvf nrpe-2.15.tar.gz; cd nrpe-2.15
#/configure --with-ssl=/usr/bin/openssl --enable-command-args --with-ssl-lib=/usr/lib/x86_64-linux-gnu && make all && make install-plugin && make install-daemon && make install-daemon-config

Add command in client $NRPE_HOME/etc/nrpe.cfg (Here $NRPE_HOME=/usr/local/nagios)
command[check_mem_rep]=/usr/local/nagios/libexec/check_mem.sh -w 72 -c 75

command[api-rep-gen]=/usr/local/nagios/libexec/run-api.sh $ARG1$ $ARG2$ $ARG2$  

On Nagios server, add a command in $NAGIOS_HOME/etc/objects/commands.cfg

define command{
command_name my_eventhandler
command_line /usr/local/nagios/libexec/check_nrpe -H $HOSTADDRESS$ -c api-rep-gen -a '$ARG1$ $ARG2$ $ARG3$'
}

define service{
use prod-service
host_name kings-api-server-5
service_description run-api-report
check_command check_nrpe!check_mem_rep
max_check_attempts 1
notifications_enabled 1
event_handler my_eventhandler!$SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEATTEMPT$
}