Nagios does a solid job of watching over my services and hosts, but I want to do a lot more with the events it generates — when a check fails, when something recovers. Specifically, I want to give clients incredibly fine-grained control over their notifications: what services, how often, and at what level of technical detail. I also want to use those events as upsell opportunities for Xeriom — if a disk is filling up or bandwidth is being consumed faster than expected, it should be easy to suggest a plan upgrade. And I'd like to experiment with fun delivery mechanisms — iPhone push notifications, SMS gateways, audible alarms, whatever — without any risk of breaking Nagios itself.
Message queues are the natural solution here. They let you decouple systems, moving complexity and risk away from the core. Nagios shouldn't have to worry about any of this extra stuff. It should just do what it's good at: monitoring hosts and services.
Luckily, I already have ActiveMQ running for other tasks, writing a STOMP client with SMQueue is straightforward, and Nagios has several ways to execute external commands when events occur, including the global host and service event handlers. All I need is a command that accepts event data from Nagios and drops it onto the message queue.
Here's what I came up with:
require 'rubygems'
require 'smqueue'
require 'json'
message = {
:hostname => ARGV[2],
:service => ARGV[3],
:state => ARGV[4],
:state_type => ARGV[5],
:state_time => ARGV[6].to_i,
:attempt => ARGV[7].to_i,
:max_attempts => ARGV[8].to_i,
:time_t => Time.now.to_i
}
configuration = {
:host => ARGV[0],
:name => ARGV[1],
:adapter => :StompAdapter
}
broadcast = SMQueue(configuration)
broadcast.put message.to_json, "content-type" => "application/json"
You'll need Ruby and RubyGems installed. Once you have those, install the dependencies and the script like this:
sudo su -
gem sources -a http://gems.github.com/
gem install seanohalpin-smqueue json --no-ri --no-rdoc
cd /usr/bin
wget http://gist.github.com/raw/306765/2a3e9cbade88b4c6dd430e108bc8a28f95047462/notify-service-by-stomp.rb
chmod +x notify-service-by-stomp.rb
Once installed, tell Nagios to use it by adding this to your Nagios configuration:
define command {
command_name notify-service-by-stomp
command_line /usr/bin/notify-service-by-stomp.rb mq.example.com /topic/foo.bar.baz.quux $HOSTADDRESS$ "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEDURATIONSEC$ $SERVICEATTEMPT$ $MAXSERVICEATTEMPTS$
}
global_service_event_handler=notify-service-by-stomp
Change mq.example.com to the hostname of your message broker, and /topic/foo.bar.baz.quux to whatever topic or queue you want notifications sent to. Restart Nagios and events should start flowing.
Testing it
If your Nagios doesn't generate events very often, you'll want a way to verify everything is wired up correctly. Attach a simple stompcat listener to the topic, then manually fire some test notifications.
Here's a quick stompcat tool in case you don't have one handy:
#! /usr/bin/env ruby
# Run me like this:
#
# ./stompcat.rb mq.example.com /topic/foo.bar.baz.quux
#
require 'rubygems'
require 'smqueue'
configuration = {
:host => ARGV[0],
:name => ARGV[1],
:adapter => :StompAdapter
}
source = SMQueue(configuration)
source.get do |m|
payload = m.body
puts ">>> #{payload}"
end
And here's how to send a test notification to the queue:
/usr/bin/notify-service-by-stomp.rb mq.example.com \
/topic/foo.bar.baz.quux service-host.example.com "SERVICE NAME" \
WARNING HARD 86492 6 6
If it's working, you should see something like this appear in your stompcat output:
{
"time_t":1266427384,
"state":"WARNING",
"state_type":"HARD",
"state_time":86492,
"attempt":6,
"hostname":"service-host.example.com",
"max_attempts":6,
"service":"SERVICE NAME"
}
From here, you can modify the stompcat example to do anything you like — look up clients in a database, send SMS alerts if an account has enough credit, trigger webhooks, whatever takes your fancy. If you build something fun with this, I'd love to hear about it.