Decoupling Nagios Host and Service check events for fun and profit

Nagios does a pretty good job of watching over my services and hosts, but I want to do a little more with the events it creates – when it checks a service and something is wrong, or when something recovers. In particular I want to give my clients the ability to select at an incredibly high resolution what sort of notifications they get, for what services, how often, and at what level of technical detail. Coupled with this I want to up-sell the services that Xeriom offers – if the disk is getting full or the transfer quota is being consumed so fast that it wont last until the end of the month I want to make it easy to upgrade plans. I’d also like to be able to try out some fun things – iPhone push notifications, SMS gateways, audible alarms, whatever – without worrying that I might destroy Nagios and bring my monitoring setup to its knees.

Message queues are a great way of decoupling systems, moving risk and complexity elsewhere. Nagios shouldn’t have to worry about all of the stuff I want to build around the monitoring system, it should focus just on the core features that I like it for: monitoring my hosts and services.

Luckily, I already have ActiveMQ running for other tasks, writing a STOMP client using SMQueue is pretty trivial, and Nagios has several ways to execute external commands when events happen including the global host and service event handlers. All I need is a command to have Nagios run that’ll accept a bunch of information from Nagios and stick it on the message queue.

Here’s what I came up with:

require 'rubygems'
require 'smqueue'
require 'json'

message = {
  :hostname => ARGV[2],
  :service => ARGV[3],
  :state => ARGV[4],
  :state_type => ARGV[5],
  :state_time => ARGV[6].to_i,
  :attempt => ARGV[7].to_i,
  :max_attempts => ARGV[8].to_i,
  :time_t => Time.now.to_i
}

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

broadcast = SMQueue(configuration)
broadcast.put message.to_json, "content-type" => "application/json"

You’ll need Ruby and RubyGems installed. Once you have those, install the script like this:

sudo su -
gem sources -a http://gems.github.com/
gem install seanohalpin-smqueue json --no-ri --no-rdoc
cd /usr/bin
wget http://gist.github.com/raw/306765/2a3e9cbade88b4c6dd430e108bc8a28f95047462/notify-service-by-stomp.rb
chmod +x notify-service-by-stomp.rb
Once it's installed tell Nagios to use it by adding this to your Nagios configuration:
define command {
  command_name notify-service-by-stomp
  command_line /usr/bin/notify-service-by-stomp.rb mq.example.com /topic/foo.bar.baz.quux $HOSTADDRESS$ "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEDURATIONSEC$ $SERVICEATTEMPT$ $MAXSERVICEATTEMPTS$
}

global_service_event_handler=notify-service-by-stomp

Change mq.example.com to be the hostname of your message broker, and /topic/foo.bar.baz.quux to be the topic or queue that you’d like notifications to be sent to. Restart Nagios and you should start receiving notifications on that queue or topic.

If you don’t receive notifications form Nagios very often then a simple way to test that this is working is to attach stompcat – a cat type tool that uses STOMP as a source – to the topic or queue, then send a few test notifications to the same queue by manually running the same command that Nagios would.

Here’s a simple stompcat tool in case you don’t have one handy:

#! /usr/bin/env ruby

# Run me like this:
#
#   ./stompcat.rb mq.example.com /topic/foo.bar.baz.quux
#

require 'rubygems'
require 'smqueue'

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

source = SMQueue(configuration)
source.get do |m|
  payload = m.body
  puts ">>> #{payload}"
end

Here’s how to send notifications to the queue or topic:

/usr/bin/notify-service-by-stomp.rb mq.example.com \
  /topic/foo.bar.baz.quux service-host.example.com "SERVICE NAME" \
  WARNING HARD 86492 6 6

If it’s working you should get an entry like this showing up where you’re running the stompcat:

{
  "time_t":1266427384,
  "state":"WARNING",
  "state_type":"HARD",
  "state_time":86492,
  "attempt":6,
  "hostname":"service-host.example.com",
  "max_attempts":6,
  "service":"SERVICE NAME"
}

You should be able to change the stompcat example to perform more complex and interesting actions – looking up clients in a database, sending text messages if an account has enough credit, whatever you fancy. If you come up with something fun, please let me know!

First Steps with Rabbit MQ in Ruby 1.8.6

Until recently I was more than happy using ActiveMQ as my message broker. I had heard of RabbitMQ several times but never took the chance to look into it. A recent talk at LRUG made me decide that I had left it too long and that if I didn't start investigating soon I'd be left behind.

Here's how I got started using RabbitMQ 1.6.0 on OS X under Ruby 1.8.6.

Installation

 mkdir /tmp/rabbit-mq && cd /tmp/rabbit-mq
 wget http://www.rabbitmq.com/releases/rabbitmq-server/v1.6.0/rabbitmq-server-generic-unix-1.6.0.tar.gz
 tar -xzvf rabbitmq-server-generic-unix-1.6.0.tar.gz
 sudo mv rabbitmq_server-1.6.0/ /opt/local/lib

Running the server

sudo /opt/local/lib/rabbitmq_server-1.6.0/sbin/rabbitmq-server

Seriously, that's it.

Passing messages

When I wrote about getting started with SMQueue I created a consumer that pushed timestamps onto the queue and a consumer that printed the values from the queue to the terminal. Recreating that using the AMQP gem is simple.

First, install the AMQP gem.

gem sources -a http://gems.github.com
gem install tmm1-amqp

Open an IRB session and paste this code to create a producer:

require 'mq'
EM.run {
  broker = MQ.new
  EM.add_periodic_timer(1) { 
    broker.queue("timestamps").publish(Time.now.to_f)
  }
}

Open another IRB session and paste this to create a consumer:

require 'mq'
EM.run {
  broker = MQ.new
  broker.queue("timestamps").subscribe { |timestamp|
    time = Time.at(timestamp.to_f)
    puts "Got #{timestamp} which is #{time}"
  }
}

Profit. RabbitMQ is extremely easy to get started with. I don't imagine that it would take too much effort to write an adaptor for SMQueue to easily change deployed projects to use it without changing their implementation. If you do this I'd love to hear about it.

A Starling Adapter for SMQueue

Starling is a persistent, lightweight work queue implemented in Ruby which talks the memcache protocol. I've recently started playing with it because I don't have the resource to look after, or the requirement for, a full blown service bus. Starling is easier to install and configure than ActiveMQ, but it's nowhere near as fully featured. Both have their place but a discussion of when and where to use them is outside the scope of this article.

I knew that I wanted to use a message bus to turn synchronous requests into asynchronous requests, pushing work off to some background process somewhere. What I didn't know was the form that the message bus would take. If you're familiar with the Gang of Four patterns book you've probably picked out the pattern that I should use here, but to be honest I'm buggered if I know what it's called. SMQueue which I'm familiar with provides a nice abstraction that makes it easy to swap out the message bus implementation while the code remains identical. Lovely, but SMQueue doesn't come with an adaptor for Starling.

"How hard," thought I, "would it be to implement a Starling adapter for SMQueue?"

I blinked and suddenly it existed. Awesome.

require 'rubygems'
require 'smqueue'
require 'starling'
require 'yaml'

module BarkingIguana
  module Messaging
    module SMQueue
      class StarlingAdapter < ::SMQueue::Adapter
        class Configuration < ::SMQueue::AdapterConfiguration
          DEFAULT_SERVER = '127.0.0.1:22122'

          has :queue
          has :server, :default => DEFAULT_SERVER
        end

        def initialize(*args)
          super
          options = args.first
          @configuration = options[:configuration]
          @configuration[:server] ||= Configuration::DEFAULT_SERVER

          @client = ::Starling.new(@configuration[:server])
        end

        def put(*args, &block)
          @client.set @configuration[:queue], args[0].to_yaml
        end

        def get(*args, &block)
          if block_given?
            loop do
              yield next_message
            end
          else
            next_message
          end
        end

        private
        def next_message
          ::SMQueue::Message(:headers => {},
            :body => YAML.load(@client.get(@configuration[:queue])))
        end
      end
    end
  end
end

Want to use it? You'll need Starling running somewhere. After that you can implement a producer in two lines of code:

producer = SMQueue(:adapter => BarkingIguana::Messaging::SMQueue::StarlingAdapter, :queue => "some.queue.name")
producer.put "Quack quack"

On the other side of the connection, here's a sample consumer:

consumer = SMQueue(:adapter => BarkingIguana::Messaging::SMQueue::StarlingAdapter, :queue => "some.queue.name")
consumer.get do |message|
  puts message.body.inspect
  # => "Quack quack"
end

One thing that's different about this adapter compared to the current SMQueue adapters is that it assumes you want to use YAML as a transport format. I'd prefer to use XML or JSON but it is at the moment just a preference, YAML is the easiest to implement, and I'm lazy.

There's also a bunch of work to do around failover - this adapter only supports one server. I still don't know enough about how Starling would handle failover so I don't want to rush into implementing that and discover I've done it wrong.

If you can help by providing patches for either other transport formats or failover please do.

Using SMQueue with message queues that failover

Previously I wrote about using SMQueue to create some simple consumers and producers for message queues. I also wrote about setting up a high availability message store. In the case of a failure the message queue will turn the slave node into the master. Unfortunately the producer and consumer I created will forever try to reconnect to the now-dead ex-master node.

Using smqueue 0.1.0 (which was produced when I created the simple producer and consumer) it's trivial to add failover capabilities to the clients. Where the SMQueue instance is created simply add another key, secondary_host, to the configuration and point it to the second broker.

queue = SMQueue(
  :name => "/queue/numbers.ascending",
  :host => "mq1.domain.com",
  :secondary_host => "mq2.domain.com",
  :adapter => :StompAdapter
)

I think that the plan is to support more than two broker nodes and support for failover strategies into future versions of SMQueue.

Writing Ruby/Stomp clients with SMQueue

SMQueue makes writing Ruby clients that interact with message queues pretty much trivial. It's got adaptors for Spread, Stomp and Stdio. Which is pretty handy 'cause that message queue I setup a few weeks back talks Stomp and I'm quite into Ruby.

Installing SMQueue

The origin SMQueue repository doesn't yet have a way of producing a gem so there are two ways to install SMQueue: add a vendor/gems/smqueue directory to your project or build a gem from my SMQueue repository. Oddly enough, I've gone with the later approach.

Clone my repository and you'll see there's a gemspec file. You can use that to build a gem using the gem command. The whole process looks something like this:

git clone http://barkingiguana.com/~craig/smqueue.git
cd smqueue
gem build smqueue.gemspec
sudo gem install ./smqueue-0.1.0.gem

I'm reliably informed that when SMQueue does build into a gem it'll start at 0.2.0 so having a 0.1.0 installed wont cause any clashes.

Note that I've removed the Spread adaptor from my branch because I don't have a working spread client on my system and I can't get SMQueue to load without one. I'm sure that'll be fixed in a future release.

Basic assumptions

I've made the following assumptions for this article: that you have a working Ruby 1.8.6 install, and that you have an ActiveMQ instance running locally with the Stomp connector enabled. You'll have to change the code to match your environment if these assumptions aren't correct.

A simple producer

Now that SMQueue is installed I'll take a contrived example and implement it. Let's say I want an ascending number to be put onto a queue roughly every second. A pretty good source for these numbers might be the current time represented as seconds from the epoch. Handily I can get just such a number really easily in Ruby.

>> Time.now.to_i
=> 1230602445
>> Time.now.to_i
=> 1230602446
>> Time.now.to_i
=> 1230602447

I can get a number to be output every second by wrapping it in a loop and sleeping a second at the end of the loop.

>> loop do
?>   puts Time.now.to_i
>>   sleep 1
>> end
1230602557
1230602558
1230602559

Easy enough to get them on STDOUT, but how do I get them into a queue? Well, for that I need to start using the SMQueue library, create a client for the queue, and put a representation of the number onto the queue.

require 'rubygems'
require 'smqueue'

queue = SMQueue(
  :name => "/queue/numbers.ascending",
  :host => "localhost",
  :adapter => :StompAdapter
)

loop do
  number = Time.now.to_i
  puts "Sending #{number}"
  queue.puts number.to_yaml
  sleep 1
end

Paste the below into a terminal somewhere to kick off the producer. You should see a steady stream of output - about one message a second - saying that it's sending a number.

cat > producer.rb <<EOF
require 'rubygems'
require 'smqueue'

queue = SMQueue(
  :name => "/queue/numbers.ascending",
  :host => "localhost",
  :adapter => :StompAdapter
)

loop do
  number = Time.now.to_i
  puts "Sending #{number}"
  queue.puts number.to_yaml
  sleep 1
end
EOF
ruby producer.rb

A simple consumer

Now that I have a simple producer running, l'll take the messages and convert them back into a time. It's a pretty pointless task for the consumer, but it'll show just how easy it is to write one.

require 'rubygems'
require 'smqueue'
require 'yaml'

queue = SMQueue(
  :name => "/queue/numbers.ascending",
  :host => "localhost",
  :adapter => :StompAdapter
)

queue.get do |message|
  number = YAML.parse(message.body).transform
  time = Time.at(number)
  puts "Got #{number} which is #{time}"
end

Let's go through the important parts in more detail.

I tell the queue that I want to capture messages.

queue.get do |message|

The producer put the messages in as YAML so I need to transform them back to their native state. I can do this by parsing the YAML then transforming it.

number = YAML.parse(message.body).transform

Now that I have the number, I convert it to a time and output both the original number and the calculated time.

time = Time.at(number)
puts "Got #{number} which is #{time}"

That's pretty much it... run the below code to start running the consumer.

cat > consumer.rb <<EOF
require 'rubygems'
require 'smqueue'
require 'yaml'

queue = SMQueue(
  :name => "/queue/numbers.ascending",
  :host => "localhost",
  :adapter => :StompAdapter
)

queue.get do |message|
  number = YAML.parse(message.body).transform
  time = Time.at(number)
  puts "Got #{number} which is #{time}"
end
EOF
ruby consumer.rb

For each message that your producer creates you should now see your consumer print a message to the screen.

High Availability ActiveMQ using a MySQL datastore

Now that we have ActiveMQ deployed it'd be quite nice to reduce the impact of a broker being unavailable - perhaps because it's dropped off the network, or because we want to upgrade the kernel or ActiveMQ install. Let's setup a High Availability ActiveMQ cluster.

High Availability Options

There are lots of ways to run ActiveMQ as master / slave cluster for HA but we already have an HA MySQL setup so I'd like to use that as the datatstore. In ActiveMQ terms that means I'd like to setup a JDBC master / slave cluster.

Setting up ActiveMQ to use a MySQL Datastore

It turns out that this is really easy to setup. First, configure ActiveMQ to use MySQL then make sure you're using InnoDB. The only change I made to these instructions was to change dataDirectory="${activemq.base}/activemq-data" to dataDirectory="${activemq.base}/data". Remember to change the broker name in activemq.xml to match the machine name. You've now got one broker running with a MySQL datastore.

Adding a slave for failover

To setup the slave a slave, install a second instance of ActiveMQ doing exactly the same as above - make sure the broker name is unique. Umm... that's it!

Starting the cluster

Start the DaemonTools services. It doesn't really matter which broker is master so it doesn't matter which order you start them in.

svc -u /etc/service/activemq

When you tail the logs of both brokers you should see one stop after loading the database driver. It's trying to acquire the lock on the datastore and will stay here until the master fails and the lock is released. At that point it will take over as the master.

You can test failover by shutting down the current master. Success is shown in the logs of the slave that's taking over as master: it'll say it's acquired the lock.

Deploying ActiveMQ on Ubuntu 8.10

I used Ubuntu 8.10 in this article but the instructions will probably work on 8.04 and 7.10 as well. I've not tested those though, and I'm not sure if it'll work on other versions of Ubuntu. Feedback would be awesome.

Prerequisites

ActiveMQ is a Java aplication so, well, you'll need Java installed.

sudo apt-get install openjdk-6-jre

Installing ActiveMQ

  1. Grab the latest stable release using wget. I used 5.2.0.
    wget http://www.apache.org/dist/activemq/apache-activemq/5.2.0/apache-activemq-5.2.0-bin.tar.gz
  2. Unpack it somewhere. I use /usr/local although I believe this may be bad practice. Leave a comment if there's somewhere better for this!
    sudo tar -xzvf apache-activemq-5.2.0-bin.tar.gz -C /usr/local/
  3. Configure the broker name in /usr/local/apache-activemq-5.2.0/conf/activemq.xml (replace all instances of "localhost" with the actual machine name)
  4. Start ActiveMQ by running /usr/local/apache-activemq-5.2.0/bin/activemq
  5. Fire up a browser and browse to http://brokername:8161/admin. You should see the ActiveMQ admin console.
  6. Keeping ActiveMQ running

    Running ActiveMQ (or indeed any service you don't absolutely have to) as root is a Bad Idea. Create an activemq user and make the data directory be owned by them.

    sudo adduser --system activemq
    sudo chown -R activemq /usr/local/apache-activemq-5.2.0/data

    I run ActiveMQ under DaemonTools to make sure it's always up. If you haven't already, install DaemonTools.

    Create a service directory for activemq and populate it with the required scripts.

    sudo mkdir -p /usr/local/apache-activemq-5.2.0/service/activemq/{,log,log/main}

    /usr/local/apache-activemq-5.2.0/service/activemq/run should look like this.

    #!/bin/sh
    exec 2>&1
    
    USER=activemq
    
    exec softlimit -m 1073741824 \
         setuidgid $USER \
    /usr/local/apache-activemq-5.2.0/bin/activemq

    /usr/local/apache-activemq-5.2.0/service/activemq/log/run should look like this.

    #!/bin/sh
    USER=activemq
    exec setuidgid $USER multilog t s1000000 n10 ./main

    Make both run scripts exectuable, the log/main directory owned by activemq and symlink the activemq service directory into /etc/service/.

    sudo sh -c "find /usr/local/apache-activemq-5.2.0/service/activemq -name 'run' |xargs chmod +x,go-wr"
    sudo chown activemq /usr/local/apache-activemq-5.2.0/service/activemq/log/main
    sudo ln -s /usr/local/apache-activemq-5.2.0/service/activemq /etc/service/activemq

    Now turn the keys and start it up.

    sudo svc -u /etc/service/activemq

    Tail the logs to make sure everything is happening as you'd expect.

    sudo tail -F /etc/service/activemq/log/main/current

    Trouble-shooting

    When I did this I got a bunch of stack traces with the following message.

    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.apache.activemq.xbean.XBeanBrokerService#0' defined in class path resource [activemq.xml]: Invocation of init method failed; nested exception is java.lang.RuntimeException: java.io.FileNotFoundException: /usr/local/apache-activemq-5.2.0/data/kr-store/state/hash-index-store-state_state (Permission denied)

    This was because I stopped ActiveMQ after I changed ownership of the data directory causing it to dump the state file owned by another user. If you get the same problem just change the ownership of the data directory again.

    Thanks

    Thanks to Sean O'Halpin who introduced me to message queues and ActiveMQ (but who doesn't have a homepage or blog that I can link to) and Dave Evans who introduced me to Daemon Tools.

About the boy

A picture of Craig in grayscale

Craig Webster is a software engineer living in London. He usually works with Ruby although sometimes he sneaks in some Erlang or JavaScript. He's into rock climbing, snowboarding, skating, photography and fencing. Yes, this does mean he has a sword.

Near here you'll find Craig's homepage, contact details, PGP key and keysigning policy, and talks.

Licence

The entire content of this blog is public domain. Use it however you fancy. You don't even need to attribute it to me, although it would be nice if you did. Just don't sue me and we'll all be happy.

I Work With Rails

Recommend Me

My Travels

I go places. Do you go places too? Let's meet up!.