In my previous post I showed how to implement a high availability web tier using Heartbeat and Apache. If you followed that you're probably pretty much sorted for serving static webpages, but what about dynamic webpages that are database driven. How do we make sure that the database is protected against failure of one of our nodes?

Preparation

You'll need two boxes and three IP addresses. Again, I've used virtual machines from Xeriom Networks. I've firewalled them and opened the MySQL and Heartbeat ports so that the servers can communicate with each other but no one else can access them.

# On db-01
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-02.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-02.vm.xeriom.net -j ACCEPT

# On db-02
sudo iptables -I INPUT 3 -p tcp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport mysql -s db-01.vm.xeriom.net -j ACCEPT
sudo iptables -I INPUT 3 -p udp --dport 694 -s db-01.vm.xeriom.net -j ACCEPT

Your firewall rules should now look something like below, the important lines being those ending in tcp dpt:mysql, udp dpt:mysql and dpt:694. The source for those lines should be the node that you're not checking the firewall rules on eg db-01 should have rules opening ports for db-02, and db-02 should have rules opening ports for db-01.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     all  --  anywhere             anywhere            
ACCEPT     all  --  anywhere             anywhere            state RELATED,ESTABLISHED 
ACCEPT     udp  --  db-01                anywhere            udp dpt:694 
ACCEPT     tcp  --  db-01                anywhere            udp dpt:mysql 
ACCEPT     tcp  --  db-01                anywhere            tcp dpt:mysql 
ACCEPT     tcp  --  anywhere             anywhere            tcp dpt:ssh

All being well, save your firewall rules so they're restored at reboot.

sudo sh -c "iptables-save -c > /etc/iptables.rules"

For the purpose of this post, let's assume that the following IP addresses are available and assigned to the boxes in brackets.

  • 193.219.108.241 - db-01 (db-01.vm.xeriom.net)
  • 193.219.108.242 - db-02 (db-02.vm.xeriom.net)
  • 193.219.108.243 - Not assigned

Start small

To begin with we'll install and configure MySQL for normal use on each of the boxes.

sudo apt-get install mysql-server --yes

Set a strong MySQL root password and wait for the packages to download and install, then edit /etc/mysql/my.cnf to make MySQL listen on all IP addresses.

bind-address = 0.0.0.0

Now restart MySQL and fire up the MySQL command-line client to check all is good.

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> \q

If you got the mysql> prompt then MySQL is running. Try connecting to the other node across the network to see if the firewall is opened and MySQL is listening on the network interface.

mysql -h db-02.vm.xeriom.net -u root -p
Enter password: [enter the MySQL root password you chose earlier]
ERROR 1130 (00000): Host 'db-01' is not allowed to connect to this MySQL server

If you got the above error then everything is working fine - MySQL connected and refused to authorise the client. We'll create some valid accounts for this later. If you got a different error (such as the one below), check MySQL is running on both boxes and that the firewall rules are allowing connections from the correct hosts.

Can't connect to MySQL server on 'db-02' (10061)

One-way replication

The first thing we want to do is setup a simple master-slave replication to see that it's possible to replicate data from one database host to the other. This requires a binary log so tell MySQL on db-01 to keep one. Edit /etc/mysql/my.cnf and set the following values under the replication section.

server-id               = 1 
log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

On db-01 grant replication slave rights to db-02. Change some_password to a real, strong password. Afterwards, make sure you restart MySQL.

mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02.vm.xeriom.net' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Jump on to db-02 and set it up to replicate data from db-01 by editing /etc/mysql/my.cnf, again replacing the hostname, username and password with the values for db-01.

server-id                 = 2
master-host               = db-01.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

One way replication should now be setup. Restart MySQL and check the status of the slave on db-02. If the Slave_IO_State is "Waiting for master to send event" then you've been successful.

# Run this on db-02 only
sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.241
                Master_User: replication
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000005
        Read_Master_Log_Pos: 98
             Relay_Log_File: mysqld-relay-bin.000004
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000005
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: 
        Replicate_Ignore_DB: 
         Replicate_Do_Table: 
     Replicate_Ignore_Table: 
    Replicate_Wild_Do_Table: 
Replicate_Wild_Ignore_Table: 
                 Last_Errno: 0
                 Last_Error: 
               Skip_Counter: 0
        Exec_Master_Log_Pos: 98
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File: 
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File: 
         Master_SSL_CA_Path: 
            Master_SSL_Cert: 
          Master_SSL_Cipher: 
             Master_SSL_Key: 
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

All being well it's time to test replication is working. We'll create the database we've configured replication for (my_application) on db-01 and watch as it appears on db-02 as well.

# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

There should be two - mysql and test.

# On db-01 only
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> create database my_application;;
# On both nodes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show databases;

The new database, my_application should appear in the output of both nodes. Success! If it doesn't show on both nodes (it didn't for me the first time I set it up), here are some tips for finding out what's wrong.

Trouble-shooting one-way replication

If the slave status above doesn't show Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes then something is wrong. This happened a few times while I was setting up replication - here's how I debugged it.

Telnet is one of the best tools in the world for debugging connectivity issues. If you haven't already, install it now.

sudo apt-get install telnet

SSH to the node that you want to check connectivity from (db-02) and telnet to the other node (db-01) on the MySQL port (3306).

# on db-02
telnet db-01.vm.xeriom.net mysql

The problem I encountered was ERROR 1130 (00000): Host 'db-02' is not allowed to connect to this MySQL server. This happens when an incorrect hostname was used in the grant replication slave query above. In my case I had granted access to clients using the full hostname (db-02.vm.xeriom.net) but MySQL looked in /etc/hosts and found a short name (db-02). Run the grant replication slave query again using the hostname given in the error message.

# on db-01
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-02' identified by 'some_password';
mysql> \q
sudo /etc/init.d/mysql restart

Another problem I encountered was that the slave status remained "connecting to master" for a long time. If you can connect using telnet this is probably caused by the server-id being the same on both servers. Check in /etc/mysql/my.cnf and if necessary change the values and restart MySQL.

Master-master replication

The above setup will replicate data one-way, but if you happen to write to the slave (db-02) then at best the data stored in the databases will be inconsistent, and there's a large possibility that replication will fail from that point onwards.

Setting up the master database so that it replicates data back from the slave would allow us to have a consistent data-set on both databases regardless of which we updated.

On db-02 edit /etc/mysql/my.cnf and configure it to keep a binary log of updates to the appropriate databases.

log_bin                 = /var/log/mysql/mysql-bin.log
expire_logs_days        = 10
max_binlog_size         = 100M
binlog_do_db            = my_application
binlog_ignore_db        = mysql
binlog_ignore_db        = test

Jump into MySQL on db-02 and grant replication slave privileges to the replication user on db-01.

# On db-02
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant replication slave on *.* to 'replication'@'db-01.vm.xeriom.net' identified by 'some_password';

Next, edit db-01 to replicate data using this account. Edit /etc/mysql/my.cnf and set the values of the new master on db-02.

master-host               = db-02.vm.xeriom.net
master-user               = replication
master-password           = some_password
master-port               = 3306

Restart MySQL on both boxes and check that the slaves are reading from the appropriate master (db-01 reads from db-02 and db-02 reads from db-01).

sudo /etc/init.d/mysql restart
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> show slave status \G

If you don't get output that says Slave_IO_State: Waiting for master to send event, Slave_IO_Running: Yes and Slave_SQL_Running: Yes on both boxes then run through the trouble shooting section above.

If you've got this far your database is now running as a Master-Master cluster. Mmm, redundancy.

Heartbeat

The data is replicated two ways across the network so or data is protected against one host going down, but at the moment we still need to configure our applications to use one or the other host: failover must be handled by the application.

I wrote previously about using Heartbeat to provide a high availability web tier. We'll use the same technique to provide a floating IP address for the database. Our applications will connect to this IP address, and Heartbeat will make sure it's pointing at a live database. Since the databases are replicating data between each other it doesn't matter which database node our applications end up connecting to.

Install and configure Heartbeat on both boxes.

sudo apt-get install heartbeat

Next we'll copy and customise the authkeys, ha.cf and haresources files from the sample documentation to the configuration directory.

sudo cp /usr/share/doc/heartbeat/authkeys /etc/ha.d/
sudo sh -c "zcat /usr/share/doc/heartbeat/ha.cf.gz > /etc/ha.d/ha.cf"
sudo sh -c "zcat /usr/share/doc/heartbeat/haresources.gz > /etc/ha.d/haresources"

The authkeys should be readable only by root because it's going to contain a valuable password.

sudo chmod go-wrx /etc/ha.d/authkeys

Edit /ec/ha.d/authkeys and add a password of your choice so that it looks like below.

auth 2
2 sha1 your-password-here

Configure ha.cf according to your network. In this case the nodes are db-01.vm.xeriom.net and db-02.vm.xeriom.net. To figure out what your node names are run uname -n on each of the database boxes. The values you use in the node directives in the configuration file must match the names in uname -n.

logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
initdead 120
bcast eth0
udpport 694
auto_failback on
node db-01.vm.xeriom.net
node db-02.vm.xeriom.net

We need to tell Heartbeat we want it to look after MySQL. Edit haresources and make it look like the following - still on both machines.

db-01.vm.xeriom.net 193.219.108.243

This file must be identical on both nodes - even the hostname, which should be the output of uname -n on node 1. The IP address should be the unassigned IP address given above in the prelude section.

Start heartbeat on db-01 then db-02.

sudo /etc/init.d/heartbeat start

This process takes quite a while to start up. tail -f /var/log/ha-log on both boxes to watch what's happening. After a while you should see db-01 say something about completing acquisition.

heartbeat[7734]: 2008/07/07_17:19:34 info: Initial resource acquisition complete (T_RESOURCES(us))
IPaddr[7739]:   2008/07/07_17:19:37 INFO:  Running OK
heartbeat[7745]: 2008/07/07_17:19:37 info: Local Resource acquisition completed.

Testing it all works

Until now both boxes have been firewalled to allow MySQL connections only from each other. To prove that the database failover works we'll have to connect from another box, possibly your desktop or laptop. Find the public IP address of your chosen machine (here it's 193.214.108.10) and add it to the accept list on both boxes on the heartbeat IP address.

# On both boxes
sudo iptables -I INPUT 3 -p tcp --dport mysql -s 193.214.108.10 -d 193.214.108.243 -j ACCEPT

Create a user which you can use to query the database, again on both boxes.

# on both boxes
mysql -u root -p
Enter password: [enter the MySQL root password you chose earlier]
mysql> grant all, replication_client on my_application.* to 'some_user'@'193.214.108.10' identified by 'some_other_password';
mysql> \q

Now connect to the IP address Hearbeat is managing (193.214.108.243) from your test box and run a query to show the slave status.

mysql -u some_user -p -h 193.214.108.243 my_application
mysql> show slave status \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 193.219.108.242
[unimportant lines snipped]

Note that the master host is db-02. Stop heartbeat (or shutdown db-01) and run the query again. You should now see that the master has changed to the IP address of the other node.

Finally, bring Heartbeat back up on db-01 (or start the box if you stopped it) and run the query again. The master host should be the same as the first time.

Auto increment offsets

To avoid problems if the replication process fails, check out avoiding auto_increment collision.

Love me!

If you've found this article useful I'd appreciate beer and recommendations at Working With Rails.

written by
Craig
published
2008-07-07
Disagree? Found a typo? Got a question?
If you'd like to have a conversation about this post, email craig@barkingiguana.com. I don't bite.
You can verify that I've written this post by following the verification instructions:
curl -LO http://barkingiguana.com/2008/07/07/high-availability-mysql-on-ubuntu-804.html.orig
curl -LO http://barkingiguana.com/2008/07/07/high-availability-mysql-on-ubuntu-804.html.orig.asc
gpg --verify high-availability-mysql-on-ubuntu-804.html.orig{.asc,}