InfoQ

Article

Introduction to BackgrounDRb

Posted by Ezra Zygmuntowicz on Jun 27, 2006

Community
Ruby
Topics
Ruby on Rails ,
Programming
Tags
DRb ,
Rails Plugins ,
Service Design

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.


Ruby on Rails is a great framework for developing many diverse types of web applications. As the problem domain of these web applications expands, you may need to run computationally intensive or long running background tasks. This poses a problem in that you are constrained to work within the request/response cycle of HTTP. So how can you run these long background tasks without your web server timing out? And how do you display the progress to your users?

Enter BackgrounDRb. This is a Rails plugin I wrote recently as one way to solve this problem. Ruby includes DRb (Distributed Ruby) as part of the standard library. DRb provides a simple API for publishing and consuming Ruby objects over TCP/IP networks or unix domain sockets. BackgrounDRb is a small framework that facilitates running background tasks in a separate process from Rails, thereby decoupling them from the request/response cycle. With DRb you can manage your tasks from Rails using hooks for progress bars or status updates to your users.

The BackgrounDRb server works by publishing a MiddleMan object. This object is the manager for your worker classes. It holds a @jobs hash composed of { job_key => running_worker_object } pairs and a @timestamps hash composed of { job_key => timestamp } pairs. The MiddleMan object straddles the interface between the DRb server and your Rails application. Here is a simple diagram to show the architecture.

This is a generic worker class as created by the worker generator provided by the plugin.

$ script/generate worker Foo
class FooWorker < BackgrounDRb::Rails
def do_work(args)
# This method is called in its own new thread when you
# call new worker. args is set to :args
end

end

When your FooWorker object is instantiated from rails via MiddleMan, the do_work method is automatically run in its own thread. We use a thread here so rails does not wait for the do_work method to finish before it continues on.

With BackgrounDRb, you usually create a new worker object with an AJAX request. Your view can then use periodically_call_remote to fetch the progress of your job and display it however you like. Let's flesh out the FooWorker class and show how you would create a new FooWorker object and retrieve its progress from within a rails controller.

class FooWorker < BackgrounDRb::Rails
attr_reader :progress
def do_work(args)
@progress = 0
calculate_the_meaning_of_life(args)
end
def calculate_the_meaning_of_life(args)
while @progress < 100
# calculations here
@progress += 1
end
end
end

Now in the controller:

class MyController < ApplicationController
def start_background_task
session[:job_key] =
MiddleMan.new_worker(:class => :foo_worker,
:args => "Arguments used to instantiate a new FooWorker object")
end
def get_progress
if request.xhr?
progress_percent = MiddleMan.get_worker(session[:job_key]).progress
render :update do |page|
page.call('progressPercent', 'progressbar', progress_percent)
page.redirect_to( :action => 'done') if progress_percent >= 100
end
else
redirect_to :action => 'index'
end
end
def done
render :text => "

Your FooWorker task has completed

"
MiddleMan.delete_worker(session[:job_key])
end
end

And in your start_background_task.rhtml view file you could use something like this:









 

<%= periodically_call_remote(:url => {:action =>
'get_progress'}, :frequency => 1) %>

MiddleMan.new_worker returns a randomly generated job_key that you can store in the session for later retrieval. If you want to specify a named key instead of using the generated key you can do so like this:

 # This will throw a BackgrounDRbDuplicateKeyError if the :job_key already exists.
MiddleMan.new_worker(:class => :foo_worker,
:job_key => :my_worker,
:args => "Arguments used to instantiate a new FooWorker object")

MiddleMan.get_worker :my_worker

Upon instalation, the plugin writes a config file into RAILS_ROOT/config/backgroundrb.yml. In this file there is a load_rails config option. If this is set to true then you will be able to use your ActiveRecord objects in your worker classes. When you start the server it will use your already existing database.yml file for database connection details.

This plugin can also be used for caching large or compute-intensive objects including ActiveRecord objects. You can store rendered views or large queries in the cache. In fact you can store any text or object that can be marshalled. Here is how you would use the cache:

# Fill the cache
@posts = Post.find(:all, :include => :comments)
MiddleMan.cache_as(:post_cache, @posts)
# OR
@posts = MiddleMan.cache_as :post_cache do
Post.find(:all, :include => :comments)
end

# Retrieve the cache
@posts = MiddleMan.cache_get(:post_cache)
# OR
@posts = MiddleMan.cache_get(:post_cache) { Post.find(:all, :include => :comments) }

MiddleMan.cache_get takes an optional block argument. If the cache located at the :post_cache key is empty, the results of evaluating the block are placed in the cache and assigned to @posts. If you don't supply a block and the cache is empty it will return nil.

In the current implementation, you are responsible for expiring your own caches and deleting your own workers from the main pool. This works two ways. You can either explicitly call MiddleMan.delete_worker(:job_key) or MiddleMan.delete_cache(:cache_key). There is also a MiddleMan.gc! method that takes a Time object and deletes all jobs with a time-stamp older than the one specified. Here is a script that can be run from cron to expire jobs older than 30 minutes:

#!/usr/bin/env ruby
require "drb"
DRb.start_service
MiddleMan = DRbObject.new(nil, "druby://localhost:22222")
MiddleMan.gc!(Time.now - 60*30)

In the near future there will be a timing mechanism built into BackgrounDRb. This will allow for jobs and garbage collection to be run at scheduled times and for specifying a time-to-live parameter when you create new jobs or caches.

There are Rake tasks as well as plain Ruby command line scripts to start and stop the daemon. On OS X, linux or BSD you can use the Rake tasks to start and stop the server:

$ rake backgroundrb:start
$ rake backgroundrb:stop

On Windows you currently have to keep a console window open while you run the backgroundrb server (Hopefully this will change in the near future). So on Windows, to start the daemon you would open a console and run the command like this:

> ruby script\backgroundrb\start
# ctrl-break to stop

So what are a few real world use cases, you ask? Here is a small list of things I am currently using BackgrounDRb for:

  • Downloading and caching RSS feeds for a feed aggregator.
  • Screen scraping automation using watir to drive a web browser that navigates to other websites in the background to collect information.
  • Automating Xen VPS creation and sysadmin tasks.
  • Creating indexes in the background for Hyper Estraier and ferret search technologies.
  • Bridging Rails and IRC bots.

Plans for the future include the ability to fork new processes to handle larger jobs that require their own Ruby interpreter instance. Also work needs to be done to let BackgrounDRb run as a Windows service. Anyone who is familiar with Windows services that can offer some help here would be greatly appreciated. Suggestions and patches are also welcome.

  • rubyforge project
  • Blog
  • install as plugin: script/plugin install svn://rubyforge.org//var/svn/backgroundrb

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.

Missing javascript includes in view by Oliver Kiessler Posted Jul 2, 2006 5:32 PM
drb and novarug by Tom Copeland Posted Jul 20, 2006 2:37 PM
Corrections, for new versions by Olle Jonsson Posted Dec 8, 2006 12:06 PM
installing problem by andrew d Posted Mar 29, 2007 1:06 AM
This Article is Obsolete by hemant kumar Posted Feb 11, 2008 12:50 PM
  1. Back to top

    Missing javascript includes in view

    Jul 2, 2006 5:32 PM by Oliver Kiessler

    In order to make your example work one needs to include the rails javascript libraries in the "start_background_task.rhtml" view:

    <%= javascript_include_tag :defaults %>

  2. Back to top

    drb and novarug

    Jul 20, 2006 2:37 PM by Tom Copeland

    Brian Sletten just did a drb presentation at the Northern VA Ruby User's Group; slides and whatnot are on novarug.org.

  3. Back to top

    Corrections, for new versions

    Dec 8, 2006 12:06 PM by Olle Jonsson

    Development has been swift... and some of the code in the article is now out-of-date.


    progress_percent = MiddleMan.get_worker(session[:job_key]).progress


    Should nowadays just be:


    progress_percent = MiddleMan.worker(session[:job_key]).progress

  4. Back to top

    installing problem

    Mar 29, 2007 1:06 AM by andrew d

    when installing and creating new workers, remember to restart your rails server. I was trying to run the example but it didn't work, once I restarted WEBrick everything worked

  5. Back to top

    This Article is Obsolete

    Feb 11, 2008 12:50 PM by hemant kumar

    Folks,

    Above Article on BackgrounDRb is completely obsolete. Please do not use instructions provided here as a tutorial for using BackgrounDRb. It has caused enough pain and agony for many users. New documentation for bdrb is available at, backgroundrb.rubyforge.org and one should look there before referring anything else.

Educational Content

Brian Marick on 4 Challenges and 5 Guiding Values of Agile Software Development

Brian Marick takes us through a quick tour of the most important values and challenges to adopting Agile successfully (they aren't the typical challenges and values we hear in the community).

Are You a Software Architect?

The line between development and architecture is tricky. Does it exist at all? Is an ivory tower actually needed? There's a balance in the middle, but how do you move from developer to architect?

Agile – A Way of Life and Pragmatic Use of Authority

The word 'authority' sometimes produces an allergic response in hard-line agilists. Freedom and authority – both are bad if misused and both are good if used in right spirit for a noble cause.

Getting Started with Grails, Second Edition

"Getting Started with Grails" brings you up to speed on this modern web framework. Companies as varied as LinkedIn, Wired, and Taco Bell are all using Grails. Are you ready to get started as well?

Using ITIL V3 as a Foundation for SOA Governance

Those familiar with only ITIL V2 often scoff at the thought that ITIL could serve as a governance framework for SOA. With ITIL V3, the focus of the framework shifted towards service-orientation.

Adrian Colyer on AspectJ, tc Server and dm Server

SpringSource CTO Adrian Colyer discusses AspectJ, SpringSource's dm Server and tc Server products, OSGi and Scrum.

Adam Wiggins on Heroku

Heroku's Adam Wiggins talks about Rails, Background Jobs, Add-Ons, Ruby, and how Heroku manages to work around Ruby's inefficiencies using Erlang and other languages.

SOA as an Architectural Pattern: Best Practices in Software Architecture

For Grady Booch the foundation of a good architecture is patterns, SOA being just one of many patterns. In this Second Life presentation, Booch attempts to bring more clarity on what architecture is.