Syndication IconNew article alerts are available via Atom. Hide this message

Fail Silently with Memcache Client

For web applications caching is king and I've recently been playing with memcached to cache the results of expensive queries in a Rails application. As a client I've chosen Seattle RB's memcache-client.

The memcache-client library is rather lovely, but it does seem to have the opinion that if a memcached instance fails it should throw an exception which your code has to deal with. I don't agree with that: when a cache fails it doesn't matter. Either the application can continue running in an uncached mode - slow, but possible - or there are other memcache instances that can be used. Switching to either of these should require no special effort in code that uses the library.

Ruby, being awesome, lets me change the behaviour of the client library very easily. Monkey patching may be frowned upon, but it does have a use.

# A simple monkey-patch of MemCache so that broken memcached instances don't 
# cause fatal errors in the application. Performance may be severaly degraded
# but it should be possible to use the app anyway!
# 
# A typical use would look something like:
# 
#   result = if cache.alive?
#     fetch = cache.get(:foo)
#     if !fetch
#       fetch = calculate(:foo)
#       cache.set(:foo, fetch)
#     end
#     fetch
#   else
#     calculate(:foo)
#   end
# 
class MemCache
  # Does the cache configuration contain any memcached instances that can 
  # currently be used?
  # 
  # Author: Conor Curran [http://forwind.net/]
  # 
  def alive?
    !!cache.servers.detect{ |s| s.alive? }
  end

  # Rescue from MemCache::MemCacheError -- we want the cache to fail silently
  # (at least from the point of view of the application - you should still
  # monitor memcached).
  # 
  def get_with_rescue(*args)
    get_without_rescue(*args)
  rescue MemCache::MemCacheError
  end
  alias_method :get_without_rescue, :get
  alias_method :get, :get_with_rescue
  alias_method :[], :get
  
  # Rescue from MemCache::MemCacheError -- we want the cache to fail silently
  # (at least from the point of view of the application - you should still
  # monitor memcached).
  # 
  def set_with_rescue(*args)
    set_without_rescue(*args)
  rescue MemCache::MemCacheError
  end
  alias_method :set_without_rescue, :set
  alias_method :set, :set_with_rescue
  alias_method :[]=, :set
  alias_method :add, :set

  # Rescue from MemCache::MemCacheError -- we want the cache to fail silently
  # (at least from the point of view of the application - you should still
  # monitor memcached).
  # 
  def delete_with_rescue(*args)
    delete_without_rescue(*args)
  rescue MemCache::MemCacheError
  end
  alias_method :delete_without_rescue, :delete
  alias_method :delete, :delete_with_rescue
end

Load-balanced highly available MySQL on Ubuntu 8.04

If you followed my previous post about high availability MySQL your application now has one less single point of failure. That's good, but what happens when your MySQL cluster begins to get overloaded? By load-balancing MySQL connections between hosts you can more easily accommodate a larger volume of queries.

A load balanced database cluster

Requirements

This article will build on the MySQL cluster introduced in my previous post. If you haven't already, set that up. You'll also need another two virtual machines, each with one IP address.

  • 193.219.108.239 - lb-db-01 (lb-db-01.vm.xeriom.net)
  • 193.219.108.240 - lb-db-02 (lb-db-02.vm.xeriom.net)
  • * 193.219.108.241 - db-01 (db-01.vm.xeriom.net)
  • * 193.219.108.242 - db-02 (db-02.vm.xeriom.net)
  • * 193.219.108.243 - virtual IP address

IP addresses marked with a * are brought over from the previous article.

All boxes have been firewalled. It's just plain common sense.

We have the technology

Install Heartbeat and MySQL Proxy on both load balancer boxes.

sudo apt-get install heartbeat mysql-proxy --yes

Configure and run MySQL Proxy

Open the firewall on the database boxes to allow the load balancing boxes to connect.

# On db-01 and db-02
sudo iptables -I INPUT 4 -p tcp \
  --dport mysql -s lb-db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 4 -p tcp \
  --dport mysql -s lb-db-02.vm.xeriom.net -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

If you followed the previous post you'll probably also want to remove the rule that allowed MySQL access from the test box to the floating IP address on the backend boxes. It's not hugely important at the moment, but it's nice to be neat. When you put this into production it will become much more important to control access to the database boxes.

# On db-01 and db-02
sudo iptables -D INPUT -p tcp --dport mysql -s 193.214.108.10 \
  -d 193.214.108.243 -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

Remember to swap 193.214.108.243 for your floating IP address and 193.214.108.10 for your test box IP address or you'll get a "bad rule" error.

You'll also need to open the MySQL port on the load balancer boxes. Note that MySQL Proxy listens on port 4040, not the regular MySQL port 3306. My test box here is 193.219.108.10 - it should be whichever IP address outside the database cluster that you're going to connect from to test the proxy works.

# On lb-db-01
sudo iptables -I INPUT 4 -p tcp \
  --dport 4040 -d lb-db-01.vm.xeriom.net -s 193.219.108.10 -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"
# On lb-db-02
sudo iptables -I INPUT 4 -p tcp \
  --dport 4040 -d lb-db-02.vm.xeriom.net -s 193.219.108.10 -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

Run the proxy on both boxes, telling it the address of the real database servers, then try to connect from the test box.

sudo /usr/sbin/mysql-proxy \
  --proxy-backend-addresses=db-01.vm.xeriom.net:3306 \
  --proxy-backend-addresses=db-02.vm.xeriom.net:3306 \
  --daemon
# On the test box
mysql -u some_user -p'some_other_password' -h lb-db-01.vm.xeriom.net
mysql> \q
mysql -u some_user -p'some_other_password' -h lb-db-02.vm.xeriom.net
mysql> \q

You may be told that your load balancer hosts don't have access to the MySQL server. If this happens, login to the MySQL hosts, add a user at the hostname that failed, and try again.

ERROR 1130 (00000): Host 'lb-db-01' is not allowed to connect to this MySQL server
# On db-01 and db-02
mysql -u root -p
Enter password: [Enter your MySQL root password]
mysql> grant all on my_application.* to 'some_user'@'lb-db-01' 
  identified by 'some_other_password';
mysql> grant all on my_application.* to 'some_user'@'lb-db-02' 
  identified by 'some_other_password';
mysql> \q

If you got MySQL prompts both times then both proxies are working. Remove the firewall rules that let your test box talk directly to each node and add rules that allow access only to the floating IP address.

# On lb-db-01
sudo iptables -D INPUT -p tcp \
  --dport 4040 -d lb-db-01.vm.xeriom.net -s 193.219.108.10 \
  -j ACCEPT
sudo iptables -I INPUT 4 -p tcp \
  --dport 4040 -d 193.219.108.243 -s 193.219.108.10 \
  -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"
# On lb-db-02
sudo iptables -D INPUT -p tcp \
  --dport 4040 -d lb-db-02.vm.xeriom.net -s 193.219.108.10 \
  -j ACCEPT
sudo iptables -I INPUT 4 -p tcp \
  --dport 4040 -d 193.219.108.243 -s 193.219.108.10 \
  -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

Configure and run Heartbeat

Now it's time to configure Heartbeat on both boxes. Open up the firewall and then populate Heartbeat's configuration files.

# On lb-db-01
sudo iptables -I INPUT 4 -p udp \
  --dport 694 -s lb-db-02.vm.xeriom.net -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"
# On lb-db-02
sudo iptables -I INPUT 4 -p udp \
  --dport 694 -s lb-db-01.vm.xeriom.net -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"
# On both load balancer boxes
sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are lb-db-01.vm.xeriom.net and lb-db-02.vm.xeriom.net. To figure out what your node names are run uname -n on each of the nodes. These must match the values you use in the node directives in the configuration file.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node lb-db-01.vm.xeriom.net
node lb-db-02.vm.xeriom.net

Tell Heartbeat that it will be managing the floating IP address with lb-db-01 being the preferred node by editing /etc/ha.d/haresources. Remember that this file must be identical on both boxes.

lb-db-01.vm.xeriom.net 193.219.108.243

If you've had Heartbeat running on the database boxes (as will be the case from the last article) then nuke it now.

# On the database boxes
sudo apt-get uninstall heartbeat

Then remove the alias from eth0 on both boxes.

# On the database boxes
sudo ifconfig eth0 inet 193.219.108.243 -alias

Now we're ready to fire up Heartbeat on the load balancer boxes.

# On lb-db-01 then lb-db-02
sudo /etc/init.d/heartbeat restart

Testing, testing, testing

Fire up mysql on the test box and connect to the floating IP address. You should get the MySQL command prompt.

mysql -u some_user -p'some_other_password' -h 193.214.108.243 my_application

Typing out exactly what is done to test this would take a long time and, largely, would be a waste of space. Here's a summary of the procedure. At all stages you should get a result from your query.

  1. Run a query such as show processlist;
  2. Shutdown db-01
  3. Run the query again
  4. Start db-01
  5. Shutdown db-02
  6. Run the query again
  7. Start db-02
  8. Shutdown lb-db-01
  9. Run the query again
  10. Shutdown db-01
  11. Run the query again
  12. Start db-01
  13. Shutdown db-02
  14. Run the query again
  15. Start db-02
  16. Start lb-db-01
  17. Run the query again

If your query ran successfully each time then congratulations, you've now got a load balanced, highly available, MySQL instance.

Where now?

Being highly available and load balanced doesn't protect you from mistakes. Backup often, and check you can restore from your backups. You may be interested in building a MySQL binlog-only server to get point-in-time recovery.

MySQL Proxy talks Lua. Consider learning how to write it.

I've not yet documented how to take the cluster beyond two load balancers and two database nodes. It's possible, but it shouldn't be used as a solution to scaling the setup I've described without some research. Instead of expanding beyond two nodes in a master-master cluster it may be more suitable to setup several master-master nodes and shard or federate your data. It may be that you need to rearrange your schema or play with master-slave replication and do some tricks on the slave to make reads faster. How you scale your database depends on your data and how you use it. Do your homework... and be sure to blog about it and let me know how it goes.

Thinking of a title is the hardest part

If you found this article useful, give me some love over at Working With Rails. If I get 100 points then I get to live.

Related articles

Avoiding auto_increment collision with High Availability MySQL

If you followed my previous post about high availability MySQL your application now has one less single point of failure. That's good, but as Graeme points out there's a possibility of data collision if the replication process fails.

If replication has stopped and a query inserts into db-01 while a second query inserts into db-02 then the value of any auto_increment columns will be the same. When you get replication running again this will cause a problem.

To avoid this situation we can use auto-increment-increment and auto-increment-offset. These variables affect the way that MySQL generates the next value in an auto-incrementing series.

# On db-01, in /etc/mysql/my.cnf
auto-increment-increment = 10
auto-increment-offset = 1
# On db-02, in /etc/mysql/my.cnf
auto-increment-increment = 10
auto-increment-offset = 2

Restart MySQL on both boxes and you should now be safe from this threat of data collision.

Love me!

If you've found this article useful I'd appreciate beer and / or recommendations at Working With Rails.

Related articles

High Availability MySQL on Ubuntu 8.04

In my previous post I showed how to implement a high availability web tier using Heartbeat and Apache. If you followed that you're probably pretty much sorted for serving static webpages, but what about dynamic webpages that are database driven. How do we make sure that the database is protected against failure of one of our nodes?

Preparation

You'll need two boxes and three IP addresses. Again, I've used virtual machines from Xeriom Networks. I've firewalled them and opened the MySQL and Heartbeat ports so that the servers can communicate with each other but no one else can access them.

# On db-01
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-02.vm.xeriom.net -j ACCEPT

# On db-02
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-01.vm.xeriom.net -j ACCEPT

Your firewall rules should now look something like below, the important lines being those ending in tcp dpt:mysql, udp dpt:mysql and dpt:694. The source for those lines should be the node that you're not checking the firewall rules on eg db-01 should have rules opening ports for db-02, and db-02 should have rules opening ports for db-01.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  db-01                anywhere            udp dpt:694 
ACCEPT     tcp  --  db-01                anywhere            udp dpt:mysql 
ACCEPT     tcp  --  db-01                anywhere            tcp dpt:mysql 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh

All being well, save your firewall rules so they're restored at reboot.

sudo sh -c "iptables-save -c > /etc/iptables.rules"

For the purpose of this post, let's assume that the following IP addresses are available and assigned to the boxes in brackets.

  • 193.219.108.241 - db-01 (db-01.vm.xeriom.net)
  • 193.219.108.242 - db-02 (db-02.vm.xeriom.net)
  • 193.219.108.243 - Not assigned

Start small

To begin with we'll install and configure MySQL for normal use on each of the boxes.

sudo apt-get install mysql-server --yes

Set a strong MySQL root password and wait for the packages to download and install, then edit /etc/mysql/my.cnf to make MySQL listen on all IP addresses.

bind-address = 0.0.0.0

Now restart MySQL and fire up the MySQL command-line client to check all is good.

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> \q

If you got the mysql> prompt then MySQL is running. Try connecting to the other node across the network to see if the firewall is opened and MySQL is listening on the network interface.

mysql -h db-02.vm.xeriom.net -u root -p
Enter password: [enter the MySQL root password you chose earlier]
ERROR 1130 (00000): Host 'db-01' is not allowed to connect to this MySQL server

If you got the above error then everything is working fine - MySQL connected and refused to authorise the client. We'll create some valid accounts for this later. If you got a different error (such as the one below), check MySQL is running on both boxes and that the firewall rules are allowing connections from the correct hosts.

Can't connect to MySQL server on 'db-02' (10061)

One-way replication

The first thing we want to do is setup a simple master-slave replication to see that it's possible to replicate data from one database host to the other. This requires a binary log so tell MySQL on db-01 to keep one. Edit /etc/mysql/my.cnf and set the following values under the replication section.

server-id               = 1 
log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

On db-01 grant replication slave rights to db-02. Change some_password to a real, strong password. Afterwards, make sure you restart MySQL.

mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02.vm.xeriom.net' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Jump on to db-02 and set it up to replicate data from db-01 by editing /etc/mysql/my.cnf, again replacing the hostname, username and password with the values for db-01.

server-id                 = 2
master-host               = db-01.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

One way replication should now be setup. Restart MySQL and check the status of the slave on db-02. If the Slave_IO_State is "Waiting for master to send event" then you've been successful.

# Run this on db-02 only
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.241
                Master_User: replication
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000005
        Read_Master_Log_Pos: 98
             Relay_Log_File: mysqld-relay-bin.000004
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000005
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

All being well it's time to test replication is working. We'll create the database we've configured replication for (my_application) on db-01 and watch as it appears on db-02 as well.

# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

There should be two - mysql and test.

# On db-01 only
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> create database my_application;;
# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

The new database, my_application should appear in the output of both nodes. Success! If it doesn't show on both nodes (it didn't for me the first time I set it up), here are some tips for finding out what's wrong.

Trouble-shooting one-way replication

If the slave status above doesn't show Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes then something is wrong. This happened a few times while I was setting up replication - here's how I debugged it.

Telnet is one of the best tools in the world for debugging connectivity issues. If you haven't already, install it now.

sudo apt-get install telnet

SSH to the node that you want to check connectivity from (db-02) and telnet to the other node (db-01) on the MySQL port (3306).

# on db-02
telnet db-01.vm.xeriom.net mysql

The problem I encountered was ERROR 1130 (00000): Host 'db-02' is not allowed to connect to this MySQL server. This happens when an incorrect hostname was used in the grant replication slave query above. In my case I had granted access to clients using the full hostname (db-02.vm.xeriom.net) but MySQL looked in /etc/hosts and found a short name (db-02). Run the grant replication slave query again using the hostname given in the error message.

# on db-01
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Another problem I encountered was that the slave status remained "connecting to master" for a long time. If you can connect using telnet this is probably caused by the server-id being the same on both servers. Check in /etc/mysql/my.cnf and if necessary change the values and restart MySQL.

Master-master replication

The above setup will replicate data one-way, but if you happen to write to the slave (db-02) then at best the data stored in the databases will be inconsistent, and there's a large possibility that replication will fail from that point onwards.

Setting up the master database so that it replicates data back from the slave would allow us to have a consistent data-set on both databases regardless of which we updated.

On db-02 edit /etc/mysql/my.cnf and configure it to keep a binary log of updates to the appropriate databases.

log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

Jump into MySQL on db-02 and grant replication slave privileges to the replication user on db-01.

# On db-02
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-01.vm.xeriom.net' identified by 'some_password';

Next, edit db-01 to replicate data using this account. Edit /etc/mysql/my.cnf and set the values of the new master on db-02.

master-host               = db-02.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

Restart MySQL on both boxes and check that the slaves are reading from the appropriate master (db-01 reads from db-02 and db-02 reads from db-01).

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G

If you don't get output that says Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes on both boxes then run through the trouble shooting section above.

If you've got this far your database is now running as a Master-Master cluster. Mmm, redundancy.

Heartbeat

The data is replicated two ways across the network so or data is protected against one host going down, but at the moment we still need to configure our applications to use one or the other host: failover must be handled by the application.

I wrote previously about using Heartbeat to provide a high availability web tier. We'll use the same technique to provide a floating IP address for the database. Our applications will connect to this IP address, and Heartbeat will make sure it's pointing at a live database. Since the databases are replicating data between each other it doesn't matter which database node our applications end up connecting to.

Install and configure Heartbeat on both boxes.

sudo apt-get install heartbeat

Next we'll copy and customise the authkeys, ha.cf and haresources files from the sample documentation to the configuration directory.

sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are db-01.vm.xeriom.net and db-02.vm.xeriom.net. To figure out what your node names are run uname -n on each of the database boxes. The values you use in the node directives in the configuration file must match the names in uname -n.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node db-01.vm.xeriom.net
node db-02.vm.xeriom.net

We need to tell Heartbeat we want it to look after MySQL. Edit haresources and make it look like the following - still on both machines.

db-01.vm.xeriom.net 193.219.108.243

This file must be identical on both nodes - even the hostname, which should be the output of uname -n on node 1. The IP address should be the unassigned IP address given above in the prelude section.

Start heartbeat on db-01 then db-02.

sudo /etc/init.d/heartbeat start

This process takes quite a while to start up. tail -f /var/log/ha-log on both boxes to watch what's happening. After a while you should see db-01 say something about completing acquisition.

heartbeat[7734]: 2008/07/07_17:19:34 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[7739]:   2008/07/07_17:19:37 INFO:  Running OK
heartbeat[7745]: 2008/07/07_17:19:37 info: Local Resource acquisition completed.

Testing it all works

Until now both boxes have been firewalled to allow MySQL connections only from each other. To prove that the database failover works we'll have to connect from another box, possibly your desktop or laptop. Find the public IP address of your chosen machine (here it's 193.214.108.10) and add it to the accept list on both boxes on the heartbeat IP address.

# On both boxes
sudo iptables -I INPUT 3 -p tcp --dport mysql -s 193.214.108.10 -d 193.214.108.243 -j ACCEPT

Create a user which you can use to query the database, again on both boxes.

# on both boxes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant all, replication_client on my_application.* to 'some_user'@'193.214.108.10' identified by 'some_other_password';
mysql> \q

Now connect to the IP address Hearbeat is managing (193.214.108.243) from your test box and run a query to show the slave status.

mysql -u some_user -p -h 193.214.108.243 my_application
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.242
[unimportant lines snipped]

Note that the master host is db-02. Stop heartbeat (or shutdown db-01) and run the query again. You should now see that the master has changed to the IP address of the other node.

Finally, bring Heartbeat back up on db-01 (or start the box if you stopped it) and run the query again. The master host should be the same as the first time.

Auto increment offsets

To avoid problems if the replication process fails, check out avoiding auto_increment collision.

Love me!

If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.

Related articles

Installing CouchDB 0.8.0 on Ubuntu 8.04

CouchDB is a distrbuted document store which can be manipulated using HTTP. A more detailed introduction is available on the CouchDB site.

Some assembly required

Since CouchDB is still a fairly young project there are no packages available to install it on Ubuntu. There are rumblings which seem to indicate that Intrepid Ibis will have a package, but until then here's a quick-n-dirty way to get CouchDB running on Ubuntu 8.04.

sudo apt-get install automake autoconf libtool subversion-tools help2man 
sudo apt-get install build-essential erlang libicu38 libicu-dev
sudo apt-get install libreadline5-dev checkinstall libmozjs-dev wget
wget http://mirror.public-internet.co.uk/ftp/apache/incubator/couchdb/0.8.0-incubating/apache-couchdb-0.8.0-incubating.tar.gz
tar -xzvf apache-couchdb-0.8.0-incubating.tar.gz
cd apache-couchdb-0.8.0-incubating
./configure
make && sudo make install
sudo adduser couchdb
sudo mkdir -p /usr/local/var/lib/couchdb
sudo chown -R couchdb /usr/local/var/lib/couchdb
sudo mkdir -p /usr/local/var/log/couchdb
sudo chown -R couchdb /usr/local/var/log/couchdb
sudo mkdir -p /usr/local/var/run
sudo chown -R couchdb /usr/local/var/run
sudo update-rc.d couchdb defaults
sudo cp /usr/local/etc/init.d/couchdb /etc/init.d/
sudo /etc/init.d/couchdb start

Let others REST on your Couch

By default CouchDB listens only for connections from the local host. To change that edit /usr/local/etc/couchdb/couch.ini and restart CouchDB.

If you're running a firewall (you should be) then open the correct port.

sudo iptables -I INPUT 3 -p tcp --dport 5984 -j ACCEPT

Testing that it all works

Since CouchDB talks HTTP we can use any HTTP client to check that it's running. Our web browser, for example. Fire it up and hit the IP address of the server on port 5984. If it's running and you can access it you should get back some details about the server.

{"couchdb":"Welcome","version":"0.8.0-incubating"}

Love me!

If you've found this article useful I'd appreciate recommendations at Working With Rails.

Related articles

High Availability Apache on Ubuntu 8.04

It's nice when your website continues to be served even when something catastrophic happens. Running two Apache nodes and Heartbeat will help - if one server blows up, the other will take over in short order.

Prelude

You'll need two boxes and three IP addresses. I use virtual machines from Xeriom Networks. I've firewalled them and opened the HTTP port to the world.

sudo iptables -I INPUT 3 -p tcp --dport http -j ACCEPT
sudo sh -c "iptables-save -c > /etc/iptables.rules"

For the purpose of this post, let's assume that the following IP addresses are available.

  • 193.219.108.236 - Node 1 (craig-02.vm.xeriom.net)
  • 193.219.108.237 - Node 2 (craig-03.vm.xeriom.net)
  • 193.219.108.238 - Not assigned

Simple Service

First we'll setup Apache on both boxes. Nothing complex - we just want to make sure that we can serve something to HTTP clients.

Run the following command on both boxes.

sudo apt-get install apache2 --yes

Now fire up a browser and hit the IP addresses assigned to Node 1 and Node 2. You should see the default Apache page stating "It works!". If you don't, check your firewall allows www traffic. Your firewall rules should look like the below - note the line ending tcp dpt:www.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Adding resilience

Apache can serve web pages from your machines now - that's great, but it doesn't protect against one of the machines dying. For that, we use a tool called heartbeat.

Install and configure Heartbeat on both boxes.

sudo apt-get install heartbeat

Next we'll copy and customise the authkeys, ha.cf and haresources files from the sample documentation to the configuration directory.

sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are craig-02.vm.xeriom.net and craig-03.vm.xeriom.net. To figure out what your node names are run uname -n on each of the nodes. These must match the values you use in the node directives in the configuration file.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node craig-02.vm.xeriom.net
node craig-03.vm.xeriom.net

We need to tell Heartbeat we want it to look after Apache. Edit haresources and make it look like the following - still on both machines.

craig-02.vm.xeriom.net 193.219.108.238 apache2

This file must be identical on both nodes - even the hostname, which should be the output of uname -n on node 1. The IP address should be the unassigned IP address given above in the prelude section.

In ha.cf we told Heartbeat to use UDP port 694 to communicate but because we're all nicely firewalled this port is blocked. Open it on both boxes.

sudo iptables -I INPUT 2 -p udp --dport 694 -j ACCEPT

Your iptables rules should now look similar to the output below.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  anywhere             anywhere            udp dpt:694 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www 
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

Now create a file on each box that tells us which webserver we're looking at.

# Node 1 (craig-02.vm.xeriom.net)
echo "craig-02.vm.xeriom.net" > /var/www/index.html
# Node 2 (craig-03.vm.xeriom.net)
echo "craig-03.vm.xeriom.net" > /var/www/index.html

Check that this file shows up on each box by hitting the nodes IP addresses in the browser. If that works, it's time to flip the switch.

It lives... IT LIVES!

Start heartbeat on the master (node 1 / craig-02.vm.xeriom.net) then the slave (node 2 / craig-03.vm.xeriom.net).

sudo /etc/init.d/heartbeat start

This process takes quite a while to start up. tail -f /var/log/ha-log on both boxes to watch what's happening. After a while you should see node 1 say something like this.

heartbeat[6792]: 2008/06/24_11:06:21 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[6867]:   2008/06/24_11:06:22 INFO:  Running OK
heartbeat[6832]: 2008/06/24_11:06:22 info: Local Resource acquisition completed.

Testing for a broken heart

If you now check the output of ifconfig eth0:0 on both boxes you should see output like below.

# Node 1
sudo ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:16:3e:3c:70:25  
          inet addr:193.219.108.238  Bcast:193.219.108.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
# Node 2
sudo ifconfig eth0:0
eth0:0    Link encap:Ethernet  HWaddr 00:16:3e:92:ad:78  
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

Node 1 has taken over our virtual IP address. If you kill Node 1, Node 2 will take it over. You can simulate this by taking down the Heartbeat process on Node 1.

# Node 1
sudo /etc/init.d/heartbeat stop

Checking ifconfig again you should see that the virtual IP address has swapped nodes. If you bring up Node 1 again (start heartbeat) you should see the IP address swap back to that node.

If you got this far with no problems then congratulations, Heartbeat is running and your web tier will survive failure of a node. You can skip to the next section to see it working in the browser.

If you see some lines in the ha-log file telling you that the message queue is filling up then it's likely the two nodes can't communicate with each other. Check that you opened UDP port 694 on the firewall of both boxes.

heartbeat[6148]: 2008/06/24_11:05:09 ERROR: Message hist queue is filling up (500 messages in queue)

Check the firewall rules look like below - the important line is the one ending in udp dpt:694.

sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  anywhere             anywhere            udp dpt:694 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:www 
DROP       all  --  anywhere             anywhere            

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

The proof is in the pudding

Mmm, cake.

Fire up your browser and hit the virtual IP address (193.219.18.238 in this post). You should see a page telling you that you're on Node 1.

Stop heartbeat (or shutdown Node 1) and hit the IP address again in the browser. You should now see that you're hitting Node 2.

Finally, bring Heartbeat back up on Node 1 (or start the box if you stopped it) and hit the IP address again. You should now be hitting Node 1 again.

Love me!

If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.

Related articles

Offline tasks the easy way

There's been quite a lot of chat recently about various job scheduling systems and process managers for offlining expensive tasks on the LRUG list. BackgrounDRb, Beanstalk, Starling, BackgroundJob and other similar solutions have been discussed. These systems can be useful, but most of the time they're just adding unnecessary complexity.

One instance where I feel these solutions are unnecessary is where you need to strip data from an external service in a way that's totally disconnected from the HTTP request-response cycle.

Say you want to pull the most recent article from this blog every 15 minutes and create a file that could then be served statically to your visitors. A naive implementation of that functionality would look something like this:

require 'net/http'
require 'hpricot'

barking_iguana = URI.parse('http://barkingiguana.com/')
loop do
  articles = Hpricot(Net::HTTP.get(barking_iguana))
  title = (articles / "div.article a[@rel=bookmark] text()").first
  link = (articles / "div.article a[@rel=bookmark]").first['href']

  # Of course, this should have a real file path in it.
  File.open("/.../.../.../barking_iguana.ssi", "w+") do |f|
    f.write("#{title}: #{link}")
    f.flush
  end

  sleep 900 # 15 minutes
end

Doesn't that look nice? No screwing around with complex tools to handle the scheduling - just run it and it'll go forever.

"Ah," I hear you say, "but what if it crashes?" Well, in the unlikely event that such a simple script does crash I'd have something like God monitoring the processes so it would be restarted. You've got something monitoring your processes anyway (right?) so it should be pretty simple to add another process to that list.

Love me!

If you've found this article useful I'd appreciate recommendations at Working With Rails.

Related articles

About the boy

A picture of Craig in grayscale

Hi, I'm Craig and I'm a Ruby coder. I live, work and play in London. I like scaling applications and eating yoghurt. Sometimes I climb rocks. Most of the time I climb back down.

You can contact me by email, MSN or Jabber. My address on all of these is craig@xeriom.net.

Code Licence

You can use any of the code on this blog in any way you want. It's totally public domain. You don't even need to attribute it to me, although it would be nice if you did. Just don't sue me.

Friends and colleagues

Other Reading

I Work With Rails

Recommend Me

My Travels

I go places. Do you go places too? Let's meet up!.