A High Availability Architecture

Making our web applications resilient

Craig R Webster

Software Engineer, BBC A&Mi

This talk

Where we were

System Diagram

Apache on each box

Move Apache to a separate box. Apache already proxies to application via HTTP, why not do it across the network?

We can use the new Apache box to proxy back to several applications

High Availability Web Tier

Wait, a what now?

High Availability

After that change we have...

System Diagram

MySQL on each box

Move MySQL to a separate box. Application can easily talk to MySQL via TCP/IP

We can use the new MySQL box to host several databases.

High Availability Database Tier

Same idea as Apache

After that change we have...

System Diagram

Assets served through the application server

Copy assets up to High Availability Web Tier and invoke some mod_rewrite voodoo.

After that change we have...

System Diagram

Hard work in request-response cycle

Much faster to give client an IOU and ask something else to do the job later

Use an Enterprise Message Bus to send job requests

Great. What's an Enterprise Message Bus?

Enterprise Message Bus

Digital Fabric will provide pan-BBC bus. We should take advantage of that. Until then, Service Management provide this.

After that change we have...

System Diagram

One server per application

Add more application boxes! Apache's mod_proxy_balancer doesn't care and will detect and drop unavailable servers from cluster, and MySQL is already accessed across the network.

After that change we have...

System Diagram

User-generated content stored locally

Stop-gap: rsync assets to web tier. Can cause assets to appear sporadically until rsync runs. Not a nice user-experience.

Real solution: document store

Distributed Filestore

We went with MogileFS. Built by Danga for LiveJournal (massive traffic).

Assets served by application!

Perlbal can "re-proxy" MogileFS.

Perlbal

After that change we have...

System Diagram

Logs stored on each box

Send them across the network to a central log server. Application logs to syslog, syslog streams to log-host.

After that change we have...

System Diagram

No caching

Memcache provides an easy to use in-memory LRU cache.

Don't just cache pages... cache everything

That said, caching is hard to get right and it may not be appropriate to use an LRU cache for your data (think sessions). Start with the requests that hit your application hardest.

After that change we have...

System Diagram (Current at 2008-12-10)

Are we done?

Questions and contact details

Any questions?

Email me
craig@barkingiguana.com
My blog
http://barkingiguana.com/
These slides
http://barkingiguana.com/~craig/talks/2008/bbc/high-availability
All my talks (version controlled!)
http://barkingiguana.com/~craig/talks.git