Decoupling Nagios Host and Service check events for fun and profit

Nagios does a pretty good job of watching over my services and hosts, but I want to do a little more with the events it creates – when it checks a service and something is wrong, or when something recovers. In particular I want to give my clients the ability to select at an incredibly high resolution what sort of notifications they get, for what services, how often, and at what level of technical detail. Coupled with this I want to up-sell the services that Xeriom offers – if the disk is getting full or the transfer quota is being consumed so fast that it wont last until the end of the month I want to make it easy to upgrade plans. I’d also like to be able to try out some fun things – iPhone push notifications, SMS gateways, audible alarms, whatever – without worrying that I might destroy Nagios and bring my monitoring setup to its knees.

Message queues are a great way of decoupling systems, moving risk and complexity elsewhere. Nagios shouldn’t have to worry about all of the stuff I want to build around the monitoring system, it should focus just on the core features that I like it for: monitoring my hosts and services.

Luckily, I already have ActiveMQ running for other tasks, writing a STOMP client using SMQueue is pretty trivial, and Nagios has several ways to execute external commands when events happen including the global host and service event handlers. All I need is a command to have Nagios run that’ll accept a bunch of information from Nagios and stick it on the message queue.

Here’s what I came up with:

require 'rubygems'
require 'smqueue'
require 'json'

message = {
  :hostname => ARGV[2],
  :service => ARGV[3],
  :state => ARGV[4],
  :state_type => ARGV[5],
  :state_time => ARGV[6].to_i,
  :attempt => ARGV[7].to_i,
  :max_attempts => ARGV[8].to_i,
  :time_t => Time.now.to_i
}

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

broadcast = SMQueue(configuration)
broadcast.put message.to_json, "content-type" => "application/json"

You’ll need Ruby and RubyGems installed. Once you have those, install the script like this:

sudo su -
gem sources -a http://gems.github.com/
gem install seanohalpin-smqueue json --no-ri --no-rdoc
cd /usr/bin
wget http://gist.github.com/raw/306765/2a3e9cbade88b4c6dd430e108bc8a28f95047462/notify-service-by-stomp.rb
chmod +x notify-service-by-stomp.rb
Once it's installed tell Nagios to use it by adding this to your Nagios configuration:
define command {
  command_name notify-service-by-stomp
  command_line /usr/bin/notify-service-by-stomp.rb mq.example.com /topic/foo.bar.baz.quux $HOSTADDRESS$ "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEDURATIONSEC$ $SERVICEATTEMPT$ $MAXSERVICEATTEMPTS$
}

global_service_event_handler=notify-service-by-stomp

Change mq.example.com to be the hostname of your message broker, and /topic/foo.bar.baz.quux to be the topic or queue that you’d like notifications to be sent to. Restart Nagios and you should start receiving notifications on that queue or topic.

If you don’t receive notifications form Nagios very often then a simple way to test that this is working is to attach stompcat – a cat type tool that uses STOMP as a source – to the topic or queue, then send a few test notifications to the same queue by manually running the same command that Nagios would.

Here’s a simple stompcat tool in case you don’t have one handy:

#! /usr/bin/env ruby

# Run me like this:
#
#   ./stompcat.rb mq.example.com /topic/foo.bar.baz.quux
#

require 'rubygems'
require 'smqueue'

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

source = SMQueue(configuration)
source.get do |m|
  payload = m.body
  puts ">>> #{payload}"
end

Here’s how to send notifications to the queue or topic:

/usr/bin/notify-service-by-stomp.rb mq.example.com \
  /topic/foo.bar.baz.quux service-host.example.com "SERVICE NAME" \
  WARNING HARD 86492 6 6

If it’s working you should get an entry like this showing up where you’re running the stompcat:

{
  "time_t":1266427384,
  "state":"WARNING",
  "state_type":"HARD",
  "state_time":86492,
  "attempt":6,
  "hostname":"service-host.example.com",
  "max_attempts":6,
  "service":"SERVICE NAME"
}

You should be able to change the stompcat example to perform more complex and interesting actions – looking up clients in a database, sending text messages if an account has enough credit, whatever you fancy. If you come up with something fun, please let me know!

Using NTPD in a Ubuntu 8.04 Xen Virtual Machine

It's a good idea to have an accurate clock on any computer you access - apart from anything else it means your logs will be consistent making event replays easier. Unfortunately over time each computer will slowly drift away from the actual time. NTP - network time protocol - keeps the clock accurate by synchronising it with a group of computers elsewhere on the internet. Unfortunately, Xen guests such as those provided by Xeriom Networks tend to be tied to the clock of the Xen host. This is a quick walk-through to show how to setup NTP and remove the dependence on the physical host.

Taking a shortcut

If you're running a Ubuntu based VM and you use the package host at Xeriom Networks then you can run this simple command to setup NTP.

sudo apt-get install xeriom-ntp-client --yes --force-yes

Gaining Independence

To stop your VM's clock being slaved to the hosts simply tell the kernel that the clock is independent.

sudo su -c "echo 1 > /proc/sys/xen/independent_wallclock"

To make sure that this persists over reboots, edit /etc/sysctl.conf to include xen.independent_wallclock = 1.

Installing and configuring NTPD

An NTP daemon can be installed using apt-get.

sudo apt-get install ntp --yes

If you're on one of Xeriom's VMs you can use time.xeriom.net as a timesource. Edit /etc/ntp.conf to use it, and perhaps a few other servers from the nearest NTP pool to you.

server time.xeriom.net prefer
server 0.uk.pool.ntp.org
server 1.uk.pool.ntp.org
server 2.uk.pool.ntp.org

Restart NTP and you're done.

sudo /etc/init.d/ntp restart

What's the time Mr. Wolf?

NTP will take maybe 15 minutes to settle down and select the best possible configured timesource to synchronise with. You can check how it's doing by using ntpq -p.

ntpq -p
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
*time.xeriom.net 212.13.194.87    3 u   39   64  377    0.402  -45.729   7.496
+dns1.rmplc.co.u 195.66.241.3     2 u   39   64  377    3.443  -54.808   6.142
+ntpt1.core.thep 194.152.64.68    3 u   40   64  377    0.723  -53.765   5.965
+weevil.pwns.ms  249.240.53.144   2 u   38   64  377    9.110  -57.739  11.427

This output contains quite a bit of information - for more complete details check out http://www.novell.com/coolsolutions/trench/418.html.

Now there's no excuse for being late

If you found this article useful, give me some love over at Working With Rails.

A simple email hub for your local network

I've been setting up the new Xeriom Networks MX service and decided that I'd document what I've done for your perusal. If you think something should be done in a different way, please do leave comments!

Requirements

The requirements for the MX service are pretty simple. We don't need to do spam filtering, Greylisting, logging or virus scanning. We're going to build a very simple service that provides reliable email delivery to hosts within our network and let our clients decide their own email policy. We will do a little blacklist checking however.

Installing the software

I'll use Postfix because I'm pretty familiar with it. This is going to be pretty simple since we don't do any filtering; the basic Postfix install matches the requirements above.

sudo apt-get install postfix --yes

Stop Postfix here since it starts automatically after install.

sudo /etc/init.d/postfix stop

Configuring Postfix

Make /etc/postfix/main.cf specify the following values.


# Don't reveal the OS in the banner.
smtpd_banner = $myhostname ESMTP $mail_name
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Send "delivery delayed" emails after 4 hours.
delay_warning_time = 4h

readme_directory = no

smtpd_tls_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file=/etc/ssl/private/ssl-cert-snakeoil.key
smtpd_use_tls=yes
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

# This is mx1.xeriom.net. Change for mx2, mx3, etc.
myhostname = mx1.xeriom.net
myorigin = mx1.xeriom.net

# Map root, abuse and postmaster to real email addresses.
virtual_alias_maps = hash:/etc/postfix/virtual

alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = 
relayhost = 
mynetworks = 127.0.0.0/8
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
local_transport = error:No local mail delivery
local_recipient_maps = 
smtpd_helo_required = yes

# Only allow the service to be used for hosts with final
# destinations within our VM network.
permit_mx_backup_networks = 193.219.108.0/24

# Only accept mail from nice people.
# Read and understand these blacklists policies before you
# use them or you risk losing mail!
smtpd_client_restrictions = reject_rbl_client zen.spamhaus.org,
  reject_rbl_client cbl.abuseat.org,
  reject_rbl_client dul.dnsbl.sorbs.net

# Only relay mail for which this machine is a listed MX backup.
smtpd_recipient_restrictions = permit_mx_backup, reject

Create the aliases database and redirect abuse, root and postmaster mail to a real email address

newaliases
echo 'postmaster postmaster@xeriom.net' >> /etc/postfix/virtual
echo 'abuse abuse@xeriom.net' >> /etc/postfix/virtual
echo 'root root@xeriom.net' >> /etc/postfix/virtual
postmap /etc/postfix/virtual

Restart Postfix so the changes take effect.

sudo /etc/init.d/postfix restart

After installing, configuring and restarting the mail server we'll need to punch a hole in the firewall to allow traffic on the SMTP port. If you don't have a firewall set up, you should - set it up now.

sudo iptables -I INPUT 4 -p tcp --dport smtp -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

Testing the setup

First, check that the new MX is listed in the zone and that the final MX is within the networks specified in permit_mx_backup_network. If they're not then edit the zone or the Postfix configuration. The domain that I'm testing this service with is emailmyfeeds.com.

dig MX emailmyfeeds.com +short
0 emailmyfeeds.com.
10 mx1.xeriom.net.
10 mx2.xeriom.net.

dig emailmyfeeds.com +short
193.219.108.60

After doing that use telnet to send a trial email through the new MX box. Below is the entire SMTP conversation for a successful send.

telnet mx1.xeriom.net smtp
Trying 193.219.108.242...
Connected to 193.219.108.242.
Escape character is '^]'.
220 mx1.xeriom.net ESMTP Postfix
EHLO my-computer
250-mx1.xeriom.net
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: craig@xeriom.net
250 2.1.0 Ok
RCPT TO: craig@emailmyfeeds.com
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
TEST!

.
250 2.0.0 Ok: queued as A6EED440BB

If, after you type the RCPT TO line you get an error something like 554 5.7.1 <test@foo.com>: Recipient address rejected: Access denied then the domain either doesn't have the MX currently listed in the zone file (or the change hasn't propagated through the DNS yet), or the final destination for the email doesn't fall within the ranges allowed by permit_mx_backup_networks.

You should also always, always check your MX's using an open relay checker - if you don't then you're helping spam distribution and I will hunt you down and hurt you.

Using the Xeriom MX service

If you're lucky enough to have a VM here at Xeriom Networks you'll be able to use this service from 2008-06-24 by following the instructions at http://wiki.xeriom.net/w/XeriomMXService.

Packaging and deployment with Ubuntu

After extensively customising some software on one of our hosts I decided that instead of repeating the procedure another 20 times I'd package the customisations and install that package onto the appropriate hosts. Only one problem: I had no idea how to create Ubuntu packages or distribute them.

After several hours gathering information from a lot of sources that never seem to tell you quite enough to get your software packaged and deployable I've pulled together two articles. Hopefully they'll be helpful to others. Oh, and please do feel free to correct my mistakes - that's the philosophy behind a Wiki after all.

Goodbye Kiwi

The very first server we ever commissioned was today taken out of the data-center and retired. kiwi.xeriom.net (or marmaduke.xeriom.net as it was known before 2005) provided three years of brilliant service, first as a shared hosting node and later as a log server. I personally spent hours playing with the box, trying to get everything just so and it was a great bit of kit.

Thank you little dude. It's been a blast.

About the boy

A picture of Craig in grayscale

Craig Webster is a software engineer living in London. He usually works with Ruby although sometimes he sneaks in some Erlang or JavaScript. He's into rock climbing, snowboarding, skating, photography and fencing. Yes, this does mean he has a sword.

Near here you'll find Craig's homepage, contact details, PGP key and keysigning policy, and talks.

Licence

The entire content of this blog is public domain. Use it however you fancy. You don't even need to attribute it to me, although it would be nice if you did. Just don't sue me and we'll all be happy.

I Work With Rails

Recommend Me

My Travels

I go places. Do you go places too? Let's meet up!.