Decoupling Nagios Host and Service check events for fun and profit

Nagios does a pretty good job of watching over my services and hosts, but I want to do a little more with the events it creates – when it checks a service and something is wrong, or when something recovers. In particular I want to give my clients the ability to select at an incredibly high resolution what sort of notifications they get, for what services, how often, and at what level of technical detail. Coupled with this I want to up-sell the services that Xeriom offers – if the disk is getting full or the transfer quota is being consumed so fast that it wont last until the end of the month I want to make it easy to upgrade plans. I’d also like to be able to try out some fun things – iPhone push notifications, SMS gateways, audible alarms, whatever – without worrying that I might destroy Nagios and bring my monitoring setup to its knees.

Message queues are a great way of decoupling systems, moving risk and complexity elsewhere. Nagios shouldn’t have to worry about all of the stuff I want to build around the monitoring system, it should focus just on the core features that I like it for: monitoring my hosts and services.

Luckily, I already have ActiveMQ running for other tasks, writing a STOMP client using SMQueue is pretty trivial, and Nagios has several ways to execute external commands when events happen including the global host and service event handlers. All I need is a command to have Nagios run that’ll accept a bunch of information from Nagios and stick it on the message queue.

Here’s what I came up with:

require 'rubygems'
require 'smqueue'
require 'json'

message = {
  :hostname => ARGV[2],
  :service => ARGV[3],
  :state => ARGV[4],
  :state_type => ARGV[5],
  :state_time => ARGV[6].to_i,
  :attempt => ARGV[7].to_i,
  :max_attempts => ARGV[8].to_i,
  :time_t => Time.now.to_i
}

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

broadcast = SMQueue(configuration)
broadcast.put message.to_json, "content-type" => "application/json"

You’ll need Ruby and RubyGems installed. Once you have those, install the script like this:

sudo su -
gem sources -a http://gems.github.com/
gem install seanohalpin-smqueue json --no-ri --no-rdoc
cd /usr/bin
wget http://gist.github.com/raw/306765/2a3e9cbade88b4c6dd430e108bc8a28f95047462/notify-service-by-stomp.rb
chmod +x notify-service-by-stomp.rb
Once it's installed tell Nagios to use it by adding this to your Nagios configuration:
define command {
  command_name notify-service-by-stomp
  command_line /usr/bin/notify-service-by-stomp.rb mq.example.com /topic/foo.bar.baz.quux $HOSTADDRESS$ "$SERVICEDESC$" $SERVICESTATE$ $SERVICESTATETYPE$ $SERVICEDURATIONSEC$ $SERVICEATTEMPT$ $MAXSERVICEATTEMPTS$
}

global_service_event_handler=notify-service-by-stomp

Change mq.example.com to be the hostname of your message broker, and /topic/foo.bar.baz.quux to be the topic or queue that you’d like notifications to be sent to. Restart Nagios and you should start receiving notifications on that queue or topic.

If you don’t receive notifications form Nagios very often then a simple way to test that this is working is to attach stompcat – a cat type tool that uses STOMP as a source – to the topic or queue, then send a few test notifications to the same queue by manually running the same command that Nagios would.

Here’s a simple stompcat tool in case you don’t have one handy:

#! /usr/bin/env ruby

# Run me like this:
#
#   ./stompcat.rb mq.example.com /topic/foo.bar.baz.quux
#

require 'rubygems'
require 'smqueue'

configuration = {
  :host => ARGV[0],
  :name => ARGV[1],
  :adapter => :StompAdapter
}

source = SMQueue(configuration)
source.get do |m|
  payload = m.body
  puts ">>> #{payload}"
end

Here’s how to send notifications to the queue or topic:

/usr/bin/notify-service-by-stomp.rb mq.example.com \
  /topic/foo.bar.baz.quux service-host.example.com "SERVICE NAME" \
  WARNING HARD 86492 6 6

If it’s working you should get an entry like this showing up where you’re running the stompcat:

{
  "time_t":1266427384,
  "state":"WARNING",
  "state_type":"HARD",
  "state_time":86492,
  "attempt":6,
  "hostname":"service-host.example.com",
  "max_attempts":6,
  "service":"SERVICE NAME"
}

You should be able to change the stompcat example to perform more complex and interesting actions – looking up clients in a database, sending text messages if an account has enough credit, whatever you fancy. If you come up with something fun, please let me know!

Keeping the software on your Ubuntu server up-to-date

New exploits are found just about every day in software both old and new. To combat this, software vendors release security updates which the Ubuntu team package up and use to release new, more secure packages of the software that you install.

It’s very hard to provide software updates for all versions of all software packages ever built for a platform like Ubuntu so the Ubuntu team produce releases of Ubuntu Linux which they offer to support for a known period of time. These support periods are currently one of two lengths depending on the release of Ubuntu you use. Long Term Support (usually noted by including “LTS” in the name) releases for the server are supported for 5 years after the release date. Regular releases are supported for 18 months. Beyond these windows support for the release is dropped and you won’t be able to upgrade or update any packages without a lot of work, so it’s important to upgrade before support for your release is stopped.

The current commonly used releases, their release date, and their end of support window are as follows:

Version Name Release Date Support Ends
10.04 [LTS] Lucid Lynx April 2010 April 2015
9.10 Karmic Koala October 29 2009 April 2011
9.04 Jaunty Jackalope April 23 2009 October 2010
8.10 Intrepid Ibex October 30 2008 April 2010
8.04.4 [LTS] Hardy Heron January 28 2010 April 2013
8.04.3 [LTS] July 16 2009 April 2013
8.04.2 [LTS] January 22 2009 April 2013
8.04.1 [LTS] July 3 2008 April 2013
8.04 [LTS] April 24 2008 April 2013
7.10 Gutsy Gibbon October 18 2007 April 2009
7.04 Feisty Fawn April 19 2007 October 2008
6.10 Edgy Eft October 26 2006 April 2008
6.06.2 [LTS] Dapper Drake January 21 2008 June 2011
6.06.1 [LTS] August 10 2006 June 2011
6.06 [LTS] June 1 2006 June 2011
5.10 Breezy Badger October 12 2005 April 2007
5.04 Hoary Hedgehog April 8 2005 October 2006
4.10 Warty Warthog October 26 2004 April 2006

Currently supported are 6.06, 8.04, 8.10, 9.04 and 9.10. Ubuntu 10.04 will be released in April.

Your responsibilities

There are two things that you as a server operator need to know how to do to stay up-to-date with the software on your server and make sure that it’s always supported. One is upgrading installed packages, and one is upgrading to the next release of Ubuntu. I’ll cover them both one at a time, but first we’ll do a little setup to make sure that both operations are nice and fast.

Using a package mirror

A very time consuming part of the update and upgrade process is downloading a large quantity of files from the servers that host the latest software packages. To make this faster Xeriom Networks provide a mirror of the software packages for 8.04, 8.10, 9.04 and 9.10. If you’re not hosted on Xeriom Networks (why not?) you should ask your current provider if they supply a package mirror for your release. If they don’t you should skip this section and hope that your connection is fast enough to cope.

Using the package mirror is simple and requires you to edit just one file. A nice simple editor is nano. Install it by connecting to your server by SSH and using sudo and apt-get to install it:

sudo apt-get install nano --yes

The contents of the file is determined by which release of Ubuntu you are running. Find out which release you run by typing this:

cat /etc/lsb-release

You should then match your release up with the appropriate box on this Wiki page: http://wiki.xeriom.net/w/XeriomUbuntuPackagesService

Copy the text from within only the box that’s matched with your release. Now we can edit the file to add the local package server:

sudo nano -w /etc/apt/sources.list

Delete all the existing lines in that file and replace them with the text you copied from the Wiki page. Now press CTRL and X and say that you do want to save the file.

Now we tell Ubuntu to refresh the list of software packages so it knows what’s available on the local package mirror:

sudo apt-get update

Congratulations, you’re now using the Xeriom Ubuntu package mirror.

Upgrading installed software

Keeping your software up-to-date is an important part of keeping your server secure but since these new packages may break existing functionality it’s best not to install them automatically. You should sit down and do this yourself, only applying the updates if they’re appropriate and necessary.

To update the software installed on your server you use a set of tools called apt – and specifically you use apt-get.

To make sure you get the latest updates you should first update your servers package database:

sudo apt-get update

Then you should ask it to upgrade your existing packages:

sudo apt-get upgrade

This will calculate everything that needs upgrading, show you a list of those packages and ask you if you want to continue. Most of the time this command will run smoothly and you’ll get the latest version of the software on your server, but you should check the list of packages that it will upgrade and be sure that you know what’s going to change before you let it complete.

Upgrading to the next release

Upgrading Ubuntu is a slightly more time consuming process with a small increase in risk because of the huge number of packages that will be upgraded. It will usually also require you to reboot your server since the kernel is likely to be upgraded so you should plan for a little downtime.

To upgrade you should use a package called update-manager-core. If this is your first time upgrading your Ubuntu release you may need to install it:

sudo apt-get install update-manager-core

Next configure it to target releases based on your preferred strategy. This is either lts, normal or never.

sudo nano -w /etc/update-manager/release-upgrades

Change the line that starts “Prompt=” to be whichever strategy you choose. For example, if I choose the lts strategy which should give me 5 years of support for each release I update to I’d enter “Prompt=lts” here. Now press CTRL and X and tell it that you want to save the file if it asks.

Now, before you upgrade, read the release notes for the version of Ubuntu that you’re going to upgrade to and make sure you understand all the current issues and caveats surrounding it.

Once you’re happy that you understand what you’re doing and you’ve scheduled a time to upgrade your server you can start the upgrade:

sudo do-release-upgrade

This will calculate all the packages that will be upgraded and ask you if you want to continue. Don’t just say yes – read the list of packages and make sure you understand what upgrading them means to your setup.

If it all goes pear-shaped

Sometimes things get messed up. Maybe the release wasn’t tested enough (rare these days) or perhaps the new release doesn’t support the same software as the one you upgraded from and you need that to run. If this ever happens we can create you a brand new image of the release you need as long as it’s still supported. Of course, your data won’t be on the new release so make sure your backups are up-to-date!

Simulating slow or laggy network connections in OS X

A client recently said that their site was loading slowly from remote sites. We got the specification of the network connection used, but I always forget how to do bandwidth limiting and latency simulation on OS X. This is a note for myself so I don't have to go searching again.

Configure a pipe that has the appropriate bandwidth limit and delay.

sudo ipfw pipe 1 config bw 16Kbit/s delay 350ms

Attach it to all traffic going to or from port 80.

sudo ipfw add 1 pipe 1 src-port 80
sudo ipfw add 2 pipe 1 dst-port 80

Now traffic coming from or going to port 80 anywhere is limited by the pipe that you specified. Do your testing and once you get frustrated with slow access to the web remove the rules like so:

sudo ipfw delete 1
sudo ipfw delete 2

High Availability ActiveMQ using a MySQL datastore

Now that we have ActiveMQ deployed it'd be quite nice to reduce the impact of a broker being unavailable - perhaps because it's dropped off the network, or because we want to upgrade the kernel or ActiveMQ install. Let's setup a High Availability ActiveMQ cluster.

High Availability Options

There are lots of ways to run ActiveMQ as master / slave cluster for HA but we already have an HA MySQL setup so I'd like to use that as the datatstore. In ActiveMQ terms that means I'd like to setup a JDBC master / slave cluster.

Setting up ActiveMQ to use a MySQL Datastore

It turns out that this is really easy to setup. First, configure ActiveMQ to use MySQL then make sure you're using InnoDB. The only change I made to these instructions was to change dataDirectory="${activemq.base}/activemq-data" to dataDirectory="${activemq.base}/data". Remember to change the broker name in activemq.xml to match the machine name. You've now got one broker running with a MySQL datastore.

Adding a slave for failover

To setup the slave a slave, install a second instance of ActiveMQ doing exactly the same as above - make sure the broker name is unique. Umm... that's it!

Starting the cluster

Start the DaemonTools services. It doesn't really matter which broker is master so it doesn't matter which order you start them in.

svc -u /etc/service/activemq

When you tail the logs of both brokers you should see one stop after loading the database driver. It's trying to acquire the lock on the datastore and will stay here until the master fails and the lock is released. At that point it will take over as the master.

You can test failover by shutting down the current master. Success is shown in the logs of the slave that's taking over as master: it'll say it's acquired the lock.

Deploying ActiveMQ on Ubuntu 8.10

I used Ubuntu 8.10 in this article but the instructions will probably work on 8.04 and 7.10 as well. I've not tested those though, and I'm not sure if it'll work on other versions of Ubuntu. Feedback would be awesome.

Prerequisites

ActiveMQ is a Java aplication so, well, you'll need Java installed.

sudo apt-get install openjdk-6-jre

Installing ActiveMQ

  1. Grab the latest stable release using wget. I used 5.2.0.
    wget http://www.apache.org/dist/activemq/apache-activemq/5.2.0/apache-activemq-5.2.0-bin.tar.gz
  2. Unpack it somewhere. I use /usr/local although I believe this may be bad practice. Leave a comment if there's somewhere better for this!
    sudo tar -xzvf apache-activemq-5.2.0-bin.tar.gz -C /usr/local/
  3. Configure the broker name in /usr/local/apache-activemq-5.2.0/conf/activemq.xml (replace all instances of "localhost" with the actual machine name)
  4. Start ActiveMQ by running /usr/local/apache-activemq-5.2.0/bin/activemq
  5. Fire up a browser and browse to http://brokername:8161/admin. You should see the ActiveMQ admin console.
  6. Keeping ActiveMQ running

    Running ActiveMQ (or indeed any service you don't absolutely have to) as root is a Bad Idea. Create an activemq user and make the data directory be owned by them.

    sudo adduser --system activemq
    sudo chown -R activemq /usr/local/apache-activemq-5.2.0/data

    I run ActiveMQ under DaemonTools to make sure it's always up. If you haven't already, install DaemonTools.

    Create a service directory for activemq and populate it with the required scripts.

    sudo mkdir -p /usr/local/apache-activemq-5.2.0/service/activemq/{,log,log/main}

    /usr/local/apache-activemq-5.2.0/service/activemq/run should look like this.

    #!/bin/sh
    exec 2>&1
    
    USER=activemq
    
    exec softlimit -m 1073741824 \
         setuidgid $USER \
    /usr/local/apache-activemq-5.2.0/bin/activemq

    /usr/local/apache-activemq-5.2.0/service/activemq/log/run should look like this.

    #!/bin/sh
    USER=activemq
    exec setuidgid $USER multilog t s1000000 n10 ./main

    Make both run scripts exectuable, the log/main directory owned by activemq and symlink the activemq service directory into /etc/service/.

    sudo sh -c "find /usr/local/apache-activemq-5.2.0/service/activemq -name 'run' |xargs chmod +x,go-wr"
    sudo chown activemq /usr/local/apache-activemq-5.2.0/service/activemq/log/main
    sudo ln -s /usr/local/apache-activemq-5.2.0/service/activemq /etc/service/activemq

    Now turn the keys and start it up.

    sudo svc -u /etc/service/activemq

    Tail the logs to make sure everything is happening as you'd expect.

    sudo tail -F /etc/service/activemq/log/main/current

    Trouble-shooting

    When I did this I got a bunch of stack traces with the following message.

    Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'org.apache.activemq.xbean.XBeanBrokerService#0' defined in class path resource [activemq.xml]: Invocation of init method failed; nested exception is java.lang.RuntimeException: java.io.FileNotFoundException: /usr/local/apache-activemq-5.2.0/data/kr-store/state/hash-index-store-state_state (Permission denied)

    This was because I stopped ActiveMQ after I changed ownership of the data directory causing it to dump the state file owned by another user. If you get the same problem just change the ownership of the data directory again.

    Thanks

    Thanks to Sean O'Halpin who introduced me to message queues and ActiveMQ (but who doesn't have a homepage or blog that I can link to) and Dave Evans who introduced me to Daemon Tools.

Running Daemontools under Ubuntu 8.10

Daemontools is a collection of tools that help keep manage processes. It's great for keeping daemons running - if they ever die then Daemontools just restarts them. Unfortunately the package for Ubuntu is a little broken because it relies on /etc/inittab and Ubuntu hasn't used this file for a long time. Here's how install Daemontools and fix the problem.

Installing Daemontools

This bit is easy.

sudo apt-get install daemontools

Daemontools is installed. Easy, huh? Unfortunately it won't start after a reboot. That's bad. The daemontools-run package was meant to make Daemontools start at system bootup but unfortunately it relies on a system that uses init... and Ubuntu doesn't. It uses upstart instead.

Make Daemontools run at system startup

Create the file /etc/event.d/svscanboot with the following content.

start on runlevel 2
start on runlevel 3
start on runlevel 4
start on runlevel 5

stop on runlevel 0
stop on runlevel 1
stop on runlevel 6

respawn
exec /usr/bin/svscanboot

You'll also need to mkdir /etc/service since this is where the Ubuntu-installed Daemontools looks for service definitions.

Now tell the system to start the process.

sudo initctl start svscanboot

Other distributions

Plenty of other distributions use upstart instead of init. Getting DaemonTools running on them is pretty similar.

Scaling: Using MogileFS for storing uploaded images

As you might have guessed from several of my previous posts, the team I've been working in has recently been scaling an application. I've learned a bunch of things along the way, several of which I've got half-written articles about and which I'll totally finish one day, honest.

One of the most awesome technologies I've started using is MogileFS, a distributed BLOB store. In our application we use this to store user-generated assets such as uploaded images and syndication feeds. I'll not go into the pros and cons of the technology here (I might do that another time), rather I'd like to share some code that we've found rather useful when handling image uploads and adding them to MogileFS: the MogileFilesystemBackend for AttachmentFu.

It's necessary to use a shared filestore for uploaded images when the application cluster you're using for uploads needs to scale beyond one physical box as otherwise the uploaded images land on several disks and there's no telling if they'll be available to a particular request to your application (that depends on which application server serves the request).

Getting stuck in

I've done some rather ugly preparation for this work and monkey-patched Kernel to provide an attr_accessor called filestore which is just an instance of MogileFS::MogileFS from the rather excellent MogileFS client by the clever people over at Seattle RB. The patch, which I'm sure will make experienced Rubyists cry, looks like this.

module Kernel
  # Oh noes, I'm screwing with Kernel.
  # 
  mattr_accessor :filestore
end

During the Rails initializer execution the filestore is setup using configuration values pulled from a YAML file in RAILS_ROOT/config/.

Kernel.filestore = MogileFS::MogileFS.new(
  :domain => "APPNAME-#{RAILS_ENV}",
  :hosts => array_of_hosts_from_yaml_file
)

(What I actually do is quite a bit different from this but that's because I've done evil things to the MogileFS client library which I'll probably share in the future. For now, believe the magic).

Now that the setup is complete, how do we get AttachmentFu to work with the filestore? We use the MogileFilesystemBackend of course!

class Image << ActiveRecord::Base
  has_attachment :content_type => :image,
    :storage => :mogile_filesystem,
    :max_size => 5.megabytes,
    :thumbnails => {
      :canonical => '1024x'
    },
    :processor => "MiniMagick"

  validates_as_attachment
end

The power behind the man

Of course, without the actual backend code not much is going to happen. The implementation was pretty heavily influenced by the existing Amazon S3 backend, mostly because the idea behind S3 and MogileFS is very similar.

module MogileFilesystemBackend
  def full_filename(thumbnail = nil)
    "#{class_prefix}:#{filestore_tag(thumbnail)}"
  end

  def filestore_tag(thumbnail = nil)
    "#{parent_id || id}:#{thumbnail || :original}"
  end

  def current_content
    temp_path ? File.read(temp_path) : temp_data
  end
  
  def public_filename(thumbnail = nil)
    [
      editorial_object_type.demodularize.tableize,
      editorial_object_id,
      "#{class_prefix}.#{file_extension}#{thumbnail && "?size=#{thumbnail}"}"
    ].join("/")
  end

  def file_extension
    Mime::Type.lookup(content_type).to_sym
  end

  def filestore_paths(thumbnail = nil)
    filestore.get_paths(full_filename(thumbnail))
  end

  def file_data(thumbnail = nil)
    filestore.get_file_data(full_filename(thumbnail))
  end

  protected
  def current_content_location
    temp_path ? :temp_path : :temp_data
  end

  def destroy_file
    filestore.delete full_filename
  end

  def rename_file
    filestore.rename @old_filename, full_filename
  end

  def save_to_storage
    logger.info "Storing #{self.class.name}\##{id} as #{full_filename(thumbnail)} (class: #{replication_policy}) from #{current_content_location == :temp_path ? temp_path : :memory}"
    filestore.store_content full_filename(thumbnail), replication_policy, current_content
  end

  def class_prefix
    self.class.name.demodularize.underscore.downcase
  end
  alias_method :replication_policy, :class_prefix
end

Technoweenie::AttachmentFu::Backends::MogileFilesystemBackend = ::MogileFilesystemBackend

Serving the public

So now you can get images into MogileFS, but in order to be useful we also need to serve them to the visitors of our application. That'll require a little work in the controller to make it read from the ever-present filestore instead of the database (if you're storing files in the database I will HURT you) or the local filesystem.

class ImageController < ApplicationController
  before_filter :load_image

  def show
  respond_to do |format|
    format.html
    format.any(:png, :jpg, :gif) do
      send_data @image.file_data(params[:size]),
        :type => @image.content_type,
        :disposition => 'inline'
    end
  end
  
  protected
  def load_image
    @image = Image.find(params[:id])
  end
end

There we have it. Images can now be requested through the ImageController and served to your adoring fans.

Found this article useful?

If you enjoyed this article I'd appreciate recommendations at Working with Rails.

LDAP authentication in an Apache fronted Rails app

If you manage anything but the simplest of setups you've probably got an LDAP server setup providing directory services to your network. If you don't you should probably stop reading now ;)

Authenticate using LDAP

The first step to getting your Rails application authenticating using LDAP is to get Apache to authenticate all requests before they reach the application. This stuff is tricky and Apache already has a rather lovely module, mod_authnz_ldap, that does all the heavy lifting for us.

<VirtualHost 193.219.108.xxx:443>
  # I've used port 443 above because I'm dealing with passwords.
  # [...snip...]
  <Directory /var/www/foo.example.com/current/public>
    AuthType Basic
    AuthName "Foo Application Control Panel"
    AuthBasicAuthoritative off
    AuthBasicProvider ldap
    AuthLDAPUrl ldap://ldap.example.com/ou=people,dc=example,dc=com?userid?one
    Require valid-user
  </Directory>
  # [...snip...]
  # Your normal Rails HTTP configuration goes here
</VirtualHost>

Look up the user in Rails

Okay, so any request that hits your application is now authenticated against your LDAP directory. Next, tell Rails to look for the user. For authentication I wrote a rather funky (if I do say so myself) mixin, Xeriom::Acts::ProtectedSystem.

module Xeriom # :nodoc:
  module Acts # :nodoc:
    module ProtectedSystem # :nodoc:
      def self.included(base)
        base.send(:extend, ClassMethods)
      end

      module ClassMethods
        def acts_as_protected_system
          include InstanceMethods
          send(:before_filter, :ensure_user_is_logged_in)
          send(:helper_method, :current_user)
          send(:helper_method, :logged_in?)
        end
      end

      module InstanceMethods
        def ensure_user_is_logged_in
          if !logged_in?
            authenticate_user
          end
        end

        def logged_in?
          !current_user.blank?
        end

        def current_user
          @current_user ||= User.find_by_id(session[:user_id])
        end

        def current_user=(user)
          @current_user = user
          session[:user_id] = user.blank? ? nil : user.id
        end

        def authenticate_user
          authenticate_or_request_with_http_basic("Protected Area") do |username, password|
            # Lock your application servers down to listen to only
            # the web tier or this will kick your ass.
            send(:current_user=, User.find_by_username(username))
          end
        end
      end
    end
  end
end

ActionController::Base.send(:include, Xeriom::Acts::ProtectedSystem)

Like the code licence section in the sidebar says: this code is totally public domain, just don't sue me. To use it just drop the code in your lib/ directory and then call acts_as_protected_system in your ApplicationController.

class ApplicationController < ActionController::Base
  helper :all # include all helpers, all the time
  protect_from_forgery # because CSRF sucks!
  acts_as_protected_system # lock the door
end

For bonus points...

If you found this article useful, give me some love over at Working With Rails.

High Availability MySQL on Ubuntu 8.04

In my previous post I showed how to implement a high availability web tier using Heartbeat and Apache. If you followed that you're probably pretty much sorted for serving static webpages, but what about dynamic webpages that are database driven. How do we make sure that the database is protected against failure of one of our nodes?

Preparation

You'll need two boxes and three IP addresses. Again, I've used virtual machines from Xeriom Networks. I've firewalled them and opened the MySQL and Heartbeat ports so that the servers can communicate with each other but no one else can access them.

# On db-01
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-02.vm.xeriom.net -j ACCEPT

# On db-02
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-01.vm.xeriom.net -j ACCEPT

Your firewall rules should now look something like below, the important lines being those ending in tcp dpt:mysql, udp dpt:mysql and dpt:694. The source for those lines should be the node that you're not checking the firewall rules on eg db-01 should have rules opening ports for db-02, and db-02 should have rules opening ports for db-01.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  db-01                anywhere            udp dpt:694 
ACCEPT     tcp  --  db-01                anywhere            udp dpt:mysql 
ACCEPT     tcp  --  db-01                anywhere            tcp dpt:mysql 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh

All being well, save your firewall rules so they're restored at reboot.

sudo sh -c "iptables-save -c > /etc/iptables.rules"

For the purpose of this post, let's assume that the following IP addresses are available and assigned to the boxes in brackets.

  • 193.219.108.241 - db-01 (db-01.vm.xeriom.net)
  • 193.219.108.242 - db-02 (db-02.vm.xeriom.net)
  • 193.219.108.243 - Not assigned

Start small

To begin with we'll install and configure MySQL for normal use on each of the boxes.

sudo apt-get install mysql-server --yes

Set a strong MySQL root password and wait for the packages to download and install, then edit /etc/mysql/my.cnf to make MySQL listen on all IP addresses.

bind-address = 0.0.0.0

Now restart MySQL and fire up the MySQL command-line client to check all is good.

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> \q

If you got the mysql> prompt then MySQL is running. Try connecting to the other node across the network to see if the firewall is opened and MySQL is listening on the network interface.

mysql -h db-02.vm.xeriom.net -u root -p
Enter password: [enter the MySQL root password you chose earlier]
ERROR 1130 (00000): Host 'db-01' is not allowed to connect to this MySQL server

If you got the above error then everything is working fine - MySQL connected and refused to authorise the client. We'll create some valid accounts for this later. If you got a different error (such as the one below), check MySQL is running on both boxes and that the firewall rules are allowing connections from the correct hosts.

Can't connect to MySQL server on 'db-02' (10061)

One-way replication

The first thing we want to do is setup a simple master-slave replication to see that it's possible to replicate data from one database host to the other. This requires a binary log so tell MySQL on db-01 to keep one. Edit /etc/mysql/my.cnf and set the following values under the replication section.

server-id               = 1 
log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

On db-01 grant replication slave rights to db-02. Change some_password to a real, strong password. Afterwards, make sure you restart MySQL.

mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02.vm.xeriom.net' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Jump on to db-02 and set it up to replicate data from db-01 by editing /etc/mysql/my.cnf, again replacing the hostname, username and password with the values for db-01.

server-id                 = 2
master-host               = db-01.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

One way replication should now be setup. Restart MySQL and check the status of the slave on db-02. If the Slave_IO_State is "Waiting for master to send event" then you've been successful.

# Run this on db-02 only
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.241
                Master_User: replication
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000005
        Read_Master_Log_Pos: 98
             Relay_Log_File: mysqld-relay-bin.000004
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000005
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

All being well it's time to test replication is working. We'll create the database we've configured replication for (my_application) on db-01 and watch as it appears on db-02 as well.

# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

There should be two - mysql and test.

# On db-01 only
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> create database my_application;;
# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

The new database, my_application should appear in the output of both nodes. Success! If it doesn't show on both nodes (it didn't for me the first time I set it up), here are some tips for finding out what's wrong.

Trouble-shooting one-way replication

If the slave status above doesn't show Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes then something is wrong. This happened a few times while I was setting up replication - here's how I debugged it.

Telnet is one of the best tools in the world for debugging connectivity issues. If you haven't already, install it now.

sudo apt-get install telnet

SSH to the node that you want to check connectivity from (db-02) and telnet to the other node (db-01) on the MySQL port (3306).

# on db-02
telnet db-01.vm.xeriom.net mysql

The problem I encountered was ERROR 1130 (00000): Host 'db-02' is not allowed to connect to this MySQL server. This happens when an incorrect hostname was used in the grant replication slave query above. In my case I had granted access to clients using the full hostname (db-02.vm.xeriom.net) but MySQL looked in /etc/hosts and found a short name (db-02). Run the grant replication slave query again using the hostname given in the error message.

# on db-01
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Another problem I encountered was that the slave status remained "connecting to master" for a long time. If you can connect using telnet this is probably caused by the server-id being the same on both servers. Check in /etc/mysql/my.cnf and if necessary change the values and restart MySQL.

Master-master replication

The above setup will replicate data one-way, but if you happen to write to the slave (db-02) then at best the data stored in the databases will be inconsistent, and there's a large possibility that replication will fail from that point onwards.

Setting up the master database so that it replicates data back from the slave would allow us to have a consistent data-set on both databases regardless of which we updated.

On db-02 edit /etc/mysql/my.cnf and configure it to keep a binary log of updates to the appropriate databases.

log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

Jump into MySQL on db-02 and grant replication slave privileges to the replication user on db-01.

# On db-02
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-01.vm.xeriom.net' identified by 'some_password';

Next, edit db-01 to replicate data using this account. Edit /etc/mysql/my.cnf and set the values of the new master on db-02.

master-host               = db-02.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

Restart MySQL on both boxes and check that the slaves are reading from the appropriate master (db-01 reads from db-02 and db-02 reads from db-01).

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G

If you don't get output that says Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes on both boxes then run through the trouble shooting section above.

If you've got this far your database is now running as a Master-Master cluster. Mmm, redundancy.

Heartbeat

The data is replicated two ways across the network so or data is protected against one host going down, but at the moment we still need to configure our applications to use one or the other host: failover must be handled by the application.

I wrote previously about using Heartbeat to provide a high availability web tier. We'll use the same technique to provide a floating IP address for the database. Our applications will connect to this IP address, and Heartbeat will make sure it's pointing at a live database. Since the databases are replicating data between each other it doesn't matter which database node our applications end up connecting to.

Install and configure Heartbeat on both boxes.

sudo apt-get install heartbeat

Next we'll copy and customise the authkeys, ha.cf and haresources files from the sample documentation to the configuration directory.

sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are db-01.vm.xeriom.net and db-02.vm.xeriom.net. To figure out what your node names are run uname -n on each of the database boxes. The values you use in the node directives in the configuration file must match the names in uname -n.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node db-01.vm.xeriom.net
node db-02.vm.xeriom.net

We need to tell Heartbeat we want it to look after MySQL. Edit haresources and make it look like the following - still on both machines.

db-01.vm.xeriom.net 193.219.108.243

This file must be identical on both nodes - even the hostname, which should be the output of uname -n on node 1. The IP address should be the unassigned IP address given above in the prelude section.

Start heartbeat on db-01 then db-02.

sudo /etc/init.d/heartbeat start

This process takes quite a while to start up. tail -f /var/log/ha-log on both boxes to watch what's happening. After a while you should see db-01 say something about completing acquisition.

heartbeat[7734]: 2008/07/07_17:19:34 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[7739]:   2008/07/07_17:19:37 INFO:  Running OK
heartbeat[7745]: 2008/07/07_17:19:37 info: Local Resource acquisition completed.

Testing it all works

Until now both boxes have been firewalled to allow MySQL connections only from each other. To prove that the database failover works we'll have to connect from another box, possibly your desktop or laptop. Find the public IP address of your chosen machine (here it's 193.214.108.10) and add it to the accept list on both boxes on the heartbeat IP address.

# On both boxes
sudo iptables -I INPUT 3 -p tcp --dport mysql -s 193.214.108.10 -d 193.214.108.243 -j ACCEPT

Create a user which you can use to query the database, again on both boxes.

# on both boxes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant all, replication_client on my_application.* to 'some_user'@'193.214.108.10' identified by 'some_other_password';
mysql> \q

Now connect to the IP address Hearbeat is managing (193.214.108.243) from your test box and run a query to show the slave status.

mysql -u some_user -p -h 193.214.108.243 my_application
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.242
[unimportant lines snipped]

Note that the master host is db-02. Stop heartbeat (or shutdown db-01) and run the query again. You should now see that the master has changed to the IP address of the other node.

Finally, bring Heartbeat back up on db-01 (or start the box if you stopped it) and run the query again. The master host should be the same as the first time.

Auto increment offsets

To avoid problems if the replication process fails, check out avoiding auto_increment collision.

Love me!

If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.

Getting started with CouchDB: A simple address book application

I've recently installed CouchDB but, still being pretty new to this whole document store thing, don't really know what they can do or how to make CouchDB do it.

The best way to learn, of course, is to do. I've decided that I'll implement a simple address-book implementation.

Investigation and technology choice

Since CouchDB talks JSON I figure that I'll write the address book in Javascript and HTML, and because CouchDB includes a web server I'll serve the application from the same place I store the data. I'll call the file that contains that addressbook application addressbook.html.

Taking a peek at the CouchDB configuration in /usr/local/etc/couchdb/couch.ini I see that the document root for the web server can be found at /usr/local/share/couchdb/www - that's where the addressbook.html file will go.

I'll need a database to store people's contact details in. There's a pretty nice interface to do this at /_utils/ which is accessible using a web browser by pointing it at the CouchDB server's IP address and port.

CouchDB comes with a Javascript wrapper which can be found at /_utils/script/couch.js but it only talks to a local server and I'm accessing the page across the internet so I'll steal some code from it and change it to work for my setup.

Implementation

First off, create the database. Jump into the interface at /_utils/ and create a database called "addressbook". That's where we'll store our data.

The user interface is going to be a webpage using Javascript which makes things pretty simple. I'll whip up a really simple page to start with.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <!-- The javascript will live in addressbook.js -->
  <script src="http://aaa.bbb.ccc.ddd:5984/_utils/addressbook.js"></script>
  <title>Address Book</title>
</head>
<body>
  <h1>Address Book</h1>
  <div id="addressbook">
    <p id="loading">Loading... please wait...</p>
  </div>
</body>
</html>

Since I've been spoiled by ActiveRecord I want to be able to say something like var people = Person.find("all"); in my code and have it return all Person records. I also want to be able to say Person.find("123456-1234-1234-123456"); to find an individual person.


Person = {
  // Push the implementation details of the database into a 
  // different object to keep Person clean.
  //
  database: AddressBook,
  find: function(id) {
    if(id == "all") {
      return this.database.allCards();
    } else {
      return this.database.openCard(id);
    }
  }
}

I've chosen to implement an AddressBook object that will abstract the details of database connection from the Person object. It will provide two methods, allCards and openCard(id). These methods talk to the CouchDB server and handle any and all data marshalling or other tricky bits and pieces.

AddressBook = {
  // Change this to point to your own CouchDB instance.
  uri: "http://craig-01.vm.xeriom.net:5984/addressbook/",

  _request: function(method, uri) {
    var req = new XMLHttpRequest();
    req.open(method, uri, false);
    req.send();
    return req;
  },

  // Fetch all address book cards.
  allCards: function() {
    var req = this._request("GET", this.uri + "_all_docs");
    var result = JSON.parse(req.responseText);
    if (req.status != 200)
      throw result;
    var allDocs = [];
    for(var offset in result.rows) {
      var id = result.rows[offset]["id"];
      var doc = this.openCard(id);
      allDocs[allDocs.length] = doc;
    }
    return allDocs;
  },

  // Fetch an individual address book card.
  openCard: function(id) {    
    var req = this._request("GET", this.uri + id);
    if (req.status == 404)
      return null;
    var result = JSON.parse(req.responseText);
    if (req.status != 200)
      throw result;
    return result;
  }
}

I push responsibility for parsing JSON off to another library. Luckily, Yahoo provide a rather nice JSON library that does just what I'm looking for - I don't have to implement it, but I do need to pull it into the webpage, and make it appear in the global namespace.


<!-- Add this to the head of addressbook.html -->
<script src="http://yui.yahooapis.com/2.5.2/build/yahoo/yahoo-min.js"></script>
<script src="http://yui.yahooapis.com/2.5.2/build/json/json-min.js"></script>

// Make YUI JSON available in the global namespace.
// Add this to addressbook.js
JSON = YAHOO.lang.JSON;

The last piece of Javascript I need to show the address book is something to load all people from the address book and add them to the page. This uses window.onload hook which is bad, but for this little application is a quick and easy to kick off some code.

// This is horrible, I know, but it's just a simple example.
window.onload = function() {
  var addressbook = document.getElementById("addressbook");
  var personList = document.createElement("ul");
  for(var offset in people) {
    var person = people[offset];
    var personNode = document.createElement("li");
    var name = document.createTextNode(person.name);
    personNode.appendChild(name);
    personList.appendChild(personNode);
  }
  addressbook.removeChild(document.getElementById("loading"));
  addressbook.appendChild(personList);
}

That's it; the application is ready to go. Upload the addressbook.html and addressbook.js file to the document root of the CouchDB server, fire up your browser and navigate to http://aaa.bbb.ccc.ddd:5984/_utils/addressbook.html where aaa.bbb.ccc.ddd is the IP address of your CouchDB instance.

A blank page that says "Address Book" should greet you. Not very impressive, right? What went wrong? Actually, nothing went wrong. There's just no data in the database.

The interface that I pointed out before for browsing and creating databases can also be used to add documents to the database. Jump into it again, navigate to the addressbook database and add a document. When it asks you for an id, just leave the field blank: it'll create one automatically. Add a field to the document called name and click the little green checkbox beside the textbox, then double click on the value of the new field and set it to your own name in quotes eg "Craig Webster". Click the green arrow beside the textbox then click "save document", jump back to the address book and hit refresh. The new record should now show up.

Moving forward

I've shown how to retrieve data from CouchDB using Javascript, but currently the data still has to be input using the CouchDB interface. Watch this space for an upcoming article on manipulating the database using Javascript so we can add cards to the addressbook.

Did this article help?

If this article helped you, I appreciate beer if you meet me, or recommendations at Working With Rails.

Installing CouchDB 0.8.0 on Ubuntu 8.04

CouchDB is a distrbuted document store which can be manipulated using HTTP. A more detailed introduction is available on the CouchDB site.

Some assembly required

Since CouchDB is still a fairly young project there are no packages available to install it on Ubuntu. There are rumblings which seem to indicate that Intrepid Ibis will have a package, but until then here's a quick-n-dirty way to get CouchDB running on Ubuntu 8.04.

sudo apt-get install automake autoconf libtool subversion-tools help2man 
sudo apt-get install build-essential erlang libicu38 libicu-dev
sudo apt-get install libreadline5-dev checkinstall libmozjs-dev wget
wget http://mirror.public-internet.co.uk/ftp/apache/incubator/couchdb/0.8.0-incubating/apache-couchdb-0.8.0-incubating.tar.gz
tar -xzvf apache-couchdb-0.8.0-incubating.tar.gz
cd apache-couchdb-0.8.0-incubating
./configure
make && sudo make install
sudo adduser couchdb
sudo mkdir -p /usr/local/var/lib/couchdb
sudo chown -R couchdb /usr/local/var/lib/couchdb
sudo mkdir -p /usr/local/var/log/couchdb
sudo chown -R couchdb /usr/local/var/log/couchdb
sudo mkdir -p /usr/local/var/run
sudo chown -R couchdb /usr/local/var/run
sudo update-rc.d couchdb defaults
sudo cp /usr/local/etc/init.d/couchdb /etc/init.d/
sudo /etc/init.d/couchdb start

Let others REST on your Couch

By default CouchDB listens only for connections from the local host. To change that edit /usr/local/etc/couchdb/couch.ini and restart CouchDB.

If you're running a firewall (you should be) then open the correct port.

sudo iptables -I INPUT 3 -p tcp --dport 5984 -j ACCEPT

Testing that it all works

Since CouchDB talks HTTP we can use any HTTP client to check that it's running. Our web browser, for example. Fire it up and hit the IP address of the server on port 5984. If it's running and you can access it you should get back some details about the server.

{"couchdb":"Welcome","version":"0.8.0-incubating"}

More CouchDB?

This is just one of several CouchDB articles on my blog, and there are plenty more on the way. Check out the other articles tagged CouchDB and check back often for new articles.

Love me!

If you've found this article useful I'd appreciate recommendations at Working With Rails.

High Availability Apache on Ubuntu 8.04

It's nice when your website continues to be served even when something catastrophic happens. Running two Apache nodes and Heartbeat will help - if one server blows up, the other will take over in short order.

Prelude

You'll need two boxes and three IP addresses. I use virtual machines from Xeriom Networks. I've firewalled them and opened the HTTP port to the world.

sudo iptables -I INPUT 3 -p tcp --dport http -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

For the purpose of this post, let's assume that the following IP addresses are available.

  • 193.219.108.236 - Node 1 (craig-02.vm.xeriom.net)
  • 193.219.108.237 - Node 2 (craig-03.vm.xeriom.net)
  • 193.219.108.238 - Not assigned

Simple Service

First we'll setup Apache on both boxes. Nothing complex - we just want to make sure that we can serve something to HTTP clients.

Run the following command on both boxes.

sudo apt-get install apache2 --yes

Now fire up a browser and hit the IP addresses assigned to Node 1 and Node 2. You should see the default Apache page stating "It works!". If you don't, check your firewall allows www traffic. Your firewall rules should look like the below - note the line ending tcp dpt:www.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Adding resilience

Apache can serve web pages from your machines now - that's great, but it doesn't protect against one of the machines dying. For that, we use a tool called heartbeat.

Install and configure Heartbeat on both boxes.

sudo apt-get install heartbeat

Next we'll copy and customise the authkeys, ha.cf and haresources files from the sample documentation to the configuration directory.

sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are craig-02.vm.xeriom.net and craig-03.vm.xeriom.net. To figure out what your node names are run uname -n on each of the nodes. These must match the values you use in the node directives in the configuration file.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node craig-02.vm.xeriom.net
node craig-03.vm.xeriom.net

We need to tell Heartbeat we want it to look after Apache. Edit haresources and make it look like the following - still on both machines.

craig-02.vm.xeriom.net 193.219.108.238 apache2

This file must be identical on both nodes - even the hostname, which should be the output of uname -n on node 1. The IP address should be the unassigned IP address given above in the prelude section.

In ha.cf we told Heartbeat to use UDP port 694 to communicate but because we're all nicely firewalled this port is blocked. Open it on both boxes.

sudo iptables -I INPUT 2 -p udp --dport 694 -j ACCEPT

Your iptables rules should now look similar to the output below.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  anywhere             anywhere            udp dpt:694 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www 
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Now create a file on each box that tells us which webserver we're looking at.

# Node 1 (craig-02.vm.xeriom.net)
echo "craig-02.vm.xeriom.net" > /var/www/index.html
# Node 2 (craig-03.vm.xeriom.net)
echo "craig-03.vm.xeriom.net" > /var/www/index.html

Check that this file shows up on each box by hitting the nodes IP addresses in the browser. If that works, it's time to flip the switch.

It lives... IT LIVES!

Start heartbeat on the master (node 1 / craig-02.vm.xeriom.net) then the slave (node 2 / craig-03.vm.xeriom.net).

sudo /etc/init.d/heartbeat start

This process takes quite a while to start up. tail -f /var/log/ha-log on both boxes to watch what's happening. After a while you should see node 1 say something like this.

heartbeat[6792]: 2008/06/24_11:06:21 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[6867]:   2008/06/24_11:06:22 INFO:  Running OK
heartbeat[6832]: 2008/06/24_11:06:22 info: Local Resource acquisition completed.

Testing for a broken heart

If you now check the output of ifconfig eth0:0 on both boxes you should see output like below.

# Node 1
sudo ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:16:3e:3c:70:25  
          inet addr:193.219.108.238  Bcast:193.219.108.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
# Node 2
sudo ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:16:3e:92:ad:78  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Node 1 has taken over our virtual IP address. If you kill Node 1, Node 2 will take it over. You can simulate this by taking down the Heartbeat process on Node 1.

# Node 1
sudo /etc/init.d/heartbeat stop

Checking ifconfig again you should see that the virtual IP address has swapped nodes. If you bring up Node 1 again (start heartbeat) you should see the IP address swap back to that node.

If you got this far with no problems then congratulations, Heartbeat is running and your web tier will survive failure of a node. You can skip to the next section to see it working in the browser.

If you see some lines in the ha-log file telling you that the message queue is filling up then it's likely the two nodes can't communicate with each other. Check that you opened UDP port 694 on the firewall of both boxes.

heartbeat[6148]: 2008/06/24_11:05:09 ERROR: Message hist queue is filling up (500 messages in queue)

Check the firewall rules look like below - the important line is the one ending in udp dpt:694.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  anywhere             anywhere            udp dpt:694 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www 
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The proof is in the pudding

Mmm, cake.

Fire up your browser and hit the virtual IP address (193.219.18.238 in this post). You should see a page telling you that you're on Node 1.

Stop heartbeat (or shutdown Node 1) and hit the IP address again in the browser. You should now see that you're hitting Node 2.

Finally, bring Heartbeat back up on Node 1 (or start the box if you stopped it) and hit the IP address again. You should now be hitting Node 1 again.

Love me!

If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.

A simple email hub for your local network

I've been setting up the new Xeriom Networks MX service and decided that I'd document what I've done for your perusal. If you think something should be done in a different way, please do leave comments!

Requirements

The requirements for the MX service are pretty simple. We don't need to do spam filtering, Greylisting, logging or virus scanning. We're going to build a very simple service that provides reliable email delivery to hosts within our network and let our clients decide their own email policy. We will do a little blacklist checking however.

Installing the software

I'll use Postfix because I'm pretty familiar with it. This is going to be pretty simple since we don't do any filtering; the basic Postfix install matches the requirements above.

sudo apt-get install postfix --yes

Stop Postfix here since it starts automatically after install.

sudo /etc/init.d/postfix stop

Configuring Postfix

Make /etc/postfix/main.cf specify the following values.


# Don't reveal the OS in the banner.
smtpd_banner = $myhostname ESMTP $mail_name
biff = no

# appending .domain is the MUA's job.
append_dot_mydomain = no

# Send "delivery delayed" emails after 4 hours.
delay_warning_time = 4h

readme_directory = no

smtpd_tls_cert_file=/etc/ssl/certs/ssl-cert-snakeoil.pem
smtpd_tls_key_file=/etc/ssl/private/ssl-cert-snakeoil.key
smtpd_use_tls=yes
smtpd_tls_session_cache_database = btree:${data_directory}/smtpd_scache
smtp_tls_session_cache_database = btree:${data_directory}/smtp_scache

# This is mx1.xeriom.net. Change for mx2, mx3, etc.
myhostname = mx1.xeriom.net
myorigin = mx1.xeriom.net

# Map root, abuse and postmaster to real email addresses.
virtual_alias_maps = hash:/etc/postfix/virtual

alias_maps = hash:/etc/aliases
alias_database = hash:/etc/aliases
mydestination = 
relayhost = 
mynetworks = 127.0.0.0/8
mailbox_size_limit = 0
recipient_delimiter = +
inet_interfaces = all
local_transport = error:No local mail delivery
local_recipient_maps = 
smtpd_helo_required = yes

# Only allow the service to be used for hosts with final
# destinations within our VM network.
permit_mx_backup_networks = 193.219.108.0/24

# Only accept mail from nice people.
# Read and understand these blacklists policies before you
# use them or you risk losing mail!
smtpd_client_restrictions = reject_rbl_client zen.spamhaus.org,
  reject_rbl_client cbl.abuseat.org,
  reject_rbl_client dul.dnsbl.sorbs.net

# Only relay mail for which this machine is a listed MX backup.
smtpd_recipient_restrictions = permit_mx_backup, reject

Create the aliases database and redirect abuse, root and postmaster mail to a real email address

newaliases
echo 'postmaster postmaster@xeriom.net' >> /etc/postfix/virtual
echo 'abuse abuse@xeriom.net' >> /etc/postfix/virtual
echo 'root root@xeriom.net' >> /etc/postfix/virtual
postmap /etc/postfix/virtual

Restart Postfix so the changes take effect.

sudo /etc/init.d/postfix restart

After installing, configuring and restarting the mail server we'll need to punch a hole in the firewall to allow traffic on the SMTP port. If you don't have a firewall set up, you should - set it up now.

sudo iptables -I INPUT 4 -p tcp --dport smtp -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

Testing the setup

First, check that the new MX is listed in the zone and that the final MX is within the networks specified in permit_mx_backup_network. If they're not then edit the zone or the Postfix configuration. The domain that I'm testing this service with is emailmyfeeds.com.

dig MX emailmyfeeds.com +short
0 emailmyfeeds.com.
10 mx1.xeriom.net.
10 mx2.xeriom.net.

dig emailmyfeeds.com +short
193.219.108.60

After doing that use telnet to send a trial email through the new MX box. Below is the entire SMTP conversation for a successful send.

telnet mx1.xeriom.net smtp
Trying 193.219.108.242...
Connected to 193.219.108.242.
Escape character is '^]'.
220 mx1.xeriom.net ESMTP Postfix
EHLO my-computer
250-mx1.xeriom.net
250-PIPELINING
250-SIZE 10240000
250-VRFY
250-ETRN
250-STARTTLS
250-ENHANCEDSTATUSCODES
250-8BITMIME
250 DSN
MAIL FROM: craig@xeriom.net
250 2.1.0 Ok
RCPT TO: craig@emailmyfeeds.com
250 2.1.5 Ok
DATA
354 End data with <CR><LF>.<CR><LF>
TEST!

.
250 2.0.0 Ok: queued as A6EED440BB

If, after you type the RCPT TO line you get an error something like 554 5.7.1 <test@foo.com>: Recipient address rejected: Access denied then the domain either doesn't have the MX currently listed in the zone file (or the change hasn't propagated through the DNS yet), or the final destination for the email doesn't fall within the ranges allowed by permit_mx_backup_networks.

You should also always, always check your MX's using an open relay checker - if you don't then you're helping spam distribution and I will hunt you down and hurt you.

Using the Xeriom MX service

If you're lucky enough to have a VM here at Xeriom Networks you'll be able to use this service from 2008-06-24 by following the instructions at http://wiki.xeriom.net/w/XeriomMXService.

Firewall a pristine Ubuntu 8.04 box

Follow these simple instructions to block all traffic but SSH to your box. Once you have these rules running you can punch more holes as required.

sudo apt-get install iptables
sudo iptables -A INPUT -i lo -j ACCEPT
sudo iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
sudo iptables -A INPUT -p tcp --dport ssh -j ACCEPT
sudo iptables -A INPUT -j DROP
sudo sh -c "iptables-save -c > /etc/iptables.rules"

If you'd like to save your current rules when you stop - or load the rules when you start the box, change your /etc/network/interfaces file so that it contains pre-up and post-down hooks to load / save the rules.

pre-up    iptables-restore < /etc/iptables.rules
post-down iptables-save -c > /etc/iptables.rules

If you're hosted at Xeriom Networks and would like to be monitored by the monitoring service there, allow ICMP Type 8 from monitor.xeriom.net.

sudo iptables -I INPUT 4 -s 193.219.108.245 -p icmp -m icmp --icmp-type 8 -j ACCEPT

Remember to save the new rules to the iptables.rules.

sudo sh -c "iptables-save -c > /etc/iptables.rules"

About the boy

A picture of Craig in grayscale

Craig Webster is a software engineer living in London. He usually works with Ruby although sometimes he sneaks in some Erlang or JavaScript. He's into rock climbing, snowboarding, skating, photography and fencing. Yes, this does mean he has a sword.

Near here you'll find Craig's homepage, contact details, PGP key and keysigning policy, and talks.

Licence

The entire content of this blog is public domain. Use it however you fancy. You don't even need to attribute it to me, although it would be nice if you did. Just don't sue me and we'll all be happy.

I Work With Rails

Recommend Me

My Travels

I go places. Do you go places too? Let's meet up!.