Nagios does a pretty good job of watching over my services and hosts, but I want to do a little more with the events it creates – when it checks a service and something is wrong, or when something recovers. In particular I want to give my clients the ability to select at an incredibly high resolution what sort of notifications they get, for what services, how often, and at what level of technical detail. Coupled with this I want to up-sell the services that Xeriom offers – if the disk is getting full or the transfer quota is being consumed so fast that it wont last until the end of the month I want to make it easy to upgrade plans. I’d also like to be able to try out some fun things – iPhone push notifications, SMS gateways, audible alarms, whatever – without worrying that I might destroy Nagios and bring my monitoring setup to its knees.

Message queues are a great way of decoupling systems, moving risk and complexity elsewhere. Nagios shouldn’t have to worry about all of the stuff I want to build around the monitoring system, it should focus just on the core features that I like it for: monitoring my hosts and services.

Luckily, I already have ActiveMQ running for other tasks, writing a STOMP client using SMQueue is pretty trivial, and Nagios has several ways to execute external commands when events happen including the global host and service event handlers. All I need is a command to have Nagios run that’ll accept a bunch of information from Nagios and stick it on the message queue.

Here’s what I came up with:

require 'rubygems'
require 'smqueue'
require 'json'

message = {
  :hostname => ARGV[2],
  :service => ARGV[3],
  :state => ARGV[4],
  :state_type => ARGV[5],
  :state_time => ARGV[6].to_i,
  :attempt => ARGV[7].to_i,
  :max_attempts => ARGV[8].to_i,
  :time_t => Time.now.to_i
}

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

broadcast = SMQueue(configuration)
broadcast.put message.to_json, "content-type" => "application/json"

You’ll need Ruby and RubyGems installed. Once you have those, install the script like this:

sudo su -
gem sources -a http://gems.github.com/
gem install seanohalpin-smqueue json --no-ri --no-rdoc
cd /usr/bin
wget http://gist.github.com/raw/306765/2a3e9cbade88b4c6dd430e108bc8a28f95047462/notify-service-by-stomp.rb
chmod +x notify-service-by-stomp.rb
Once it's installed tell Nagios to use it by adding this to your Nagios configuration:
define command {
  command_name notify-service-by-stomp
  command_line /usr/bin/notify-service-by-stomp.rb mq.example.com /topic/foo.bar.baz.quux $HOSTADDRESS$ "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEDURATIONSEC$ $SERVICEATTEMPT$ $MAXSERVICEATTEMPTS$
}

global_service_event_handler=notify-service-by-stomp

Change mq.example.com to be the hostname of your message broker, and /topic/foo.bar.baz.quux to be the topic or queue that you’d like notifications to be sent to. Restart Nagios and you should start receiving notifications on that queue or topic.

If you don’t receive notifications form Nagios very often then a simple way to test that this is working is to attach stompcat – a cat type tool that uses STOMP as a source – to the topic or queue, then send a few test notifications to the same queue by manually running the same command that Nagios would.

Here’s a simple stompcat tool in case you don’t have one handy:

#! /usr/bin/env ruby

# Run me like this:
#
#   ./stompcat.rb mq.example.com /topic/foo.bar.baz.quux
#

require 'rubygems'
require 'smqueue'

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

source = SMQueue(configuration)
source.get do |m|
  payload = m.body
  puts ">>> #{payload}"
end

Here’s how to send notifications to the queue or topic:

/usr/bin/notify-service-by-stomp.rb mq.example.com \
  /topic/foo.bar.baz.quux service-host.example.com "SERVICE NAME" \
  WARNING HARD 86492 6 6

If it’s working you should get an entry like this showing up where you’re running the stompcat:

{
  "time_t":1266427384,
  "state":"WARNING",
  "state_type":"HARD",
  "state_time":86492,
  "attempt":6,
  "hostname":"service-host.example.com",
  "max_attempts":6,
  "service":"SERVICE NAME"
}

You should be able to change the stompcat example to perform more complex and interesting actions – looking up clients in a database, sending text messages if an account has enough credit, whatever you fancy. If you come up with something fun, please let me know!

written by
Craig
published
2010-02-17
Disagree? Found a typo? Got a question?
If you'd like to have a conversation about this post, email craig@barkingiguana.com. I don't bite.
You can verify that I've written this post by following the verification instructions:
curl -LO http://barkingiguana.com/2010/02/17/decoupling-nagios-host-and-service-check-events-for-fun-and-profit.html.orig
curl -LO http://barkingiguana.com/2010/02/17/decoupling-nagios-host-and-service-check-events-for-fun-and-profit.html.orig.asc
gpg --verify decoupling-nagios-host-and-service-check-events-for-fun-and-profit.html.orig{.asc,}