Offline Tasks the Easy Way

There’s been a lot of chat on the LRUG list recently about job scheduling systems and process managers for offloading expensive tasks. BackgrounDRb, Beanstalk, Starling, BackgroundJob, all sorts of solutions have been thrown around. These systems have their place, but most of the time they’re adding complexity you just don’t need.

One case where I think they’re overkill is when you need to pull data from an external service on a schedule, completely disconnected from the HTTP request-response cycle.

Say you want to fetch the most recent article from this blog every 15 minutes and write it to a file that can be served statically. A straightforward implementation looks like this:

require 'net/http'
require 'hpricot'

barking_iguana = URI.parse('http://barkingiguana.com/')
loop do
  articles = Hpricot(Net::HTTP.get(barking_iguana))
  title = (articles / "div.article a[@rel=bookmark] text()").first
  link = (articles / "div.article a[@rel=bookmark]").first['href']

  # Of course, this should have a real file path in it.
  File.open("/.../.../.../barking_iguana.ssi", "w+") do |f|
    f.write("#{title}: #{link}")
    f.flush
  end

  sleep 900 # 15 minutes
end

That’s it. No screwing around with complex scheduling infrastructure, just run it and it loops forever.

“But what if it crashes?” Fair question. In the unlikely event that something this simple falls over, I’d have God watching the process so it gets restarted automatically. You’ve already got something monitoring your processes, right? Adding one more to the list is trivial.

Sometimes the simplest solution really is the best one.