InfoQ

Article

Introduction to BackgrounDRb

Posted by Ezra Zygmuntowicz on Jun 27, 2006 01:59 PM

Community
Ruby
Topics
Ruby on Rails,
Programming
Tags
DRb,
Rails Plugins,
Service Design

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.


Ruby on Rails is a great framework for developing many diverse types of web applications. As the problem domain of these web applications expands, you may need to run computationally intensive or long running background tasks. This poses a problem in that you are constrained to work within the request/response cycle of HTTP. So how can you run these long background tasks without your web server timing out? And how do you display the progress to your users?

Enter BackgrounDRb. This is a Rails plugin I wrote recently as one way to solve this problem. Ruby includes DRb (Distributed Ruby) as part of the standard library. DRb provides a simple API for publishing and consuming Ruby objects over TCP/IP networks or unix domain sockets. BackgrounDRb is a small framework that facilitates running background tasks in a separate process from Rails, thereby decoupling them from the request/response cycle. With DRb you can manage your tasks from Rails using hooks for progress bars or status updates to your users.

The BackgrounDRb server works by publishing a MiddleMan object. This object is the manager for your worker classes. It holds a @jobs hash composed of { job_key => running_worker_object } pairs and a @timestamps hash composed of { job_key => timestamp } pairs. The MiddleMan object straddles the interface between the DRb server and your Rails application. Here is a simple diagram to show the architecture.

This is a generic worker class as created by the worker generator provided by the plugin.

$ script/generate worker Foo
class FooWorker < BackgrounDRb::Rails
def do_work(args)
# This method is called in its own new thread when you
# call new worker. args is set to :args
end

end

When your FooWorker object is instantiated from rails via MiddleMan, the do_work method is automatically run in its own thread. We use a thread here so rails does not wait for the do_work method to finish before it continues on.

With BackgrounDRb, you usually create a new worker object with an AJAX request. Your view can then use periodically_call_remote to fetch the progress of your job and display it however you like. Let's flesh out the FooWorker class and show how you would create a new FooWorker object and retrieve its progress from within a rails controller.

class FooWorker < BackgrounDRb::Rails
attr_reader :progress
def do_work(args)
@progress = 0
calculate_the_meaning_of_life(args)
end
def calculate_the_meaning_of_life(args)
while @progress < 100
# calculations here
@progress += 1
end
end
end

Now in the controller:

class MyController < ApplicationController
def start_background_task
session[:job_key] =
MiddleMan.new_worker(:class => :foo_worker,
:args => "Arguments used to instantiate a new FooWorker object")
end
def get_progress
if request.xhr?
progress_percent = MiddleMan.get_worker(session[:job_key]).progress
render :update do |page|
page.call('progressPercent', 'progressbar', progress_percent)
page.redirect_to( :action => 'done') if progress_percent >= 100
end
else
redirect_to :action => 'index'
end
end
def done
render :text => "

Your FooWorker task has completed

"
MiddleMan.delete_worker(session[:job_key])
end
end

And in your start_background_task.rhtml view file you could use something like this:









 

<%= periodically_call_remote(:url => {:action =>
'get_progress'}, :frequency => 1) %>

MiddleMan.new_worker returns a randomly generated job_key that you can store in the session for later retrieval. If you want to specify a named key instead of using the generated key you can do so like this:

 # This will throw a BackgrounDRbDuplicateKeyError if the :job_key already exists.
MiddleMan.new_worker(:class => :foo_worker,
:job_key => :my_worker,
:args => "Arguments used to instantiate a new FooWorker object")

MiddleMan.get_worker :my_worker

Upon instalation, the plugin writes a config file into RAILS_ROOT/config/backgroundrb.yml. In this file there is a load_rails config option. If this is set to true then you will be able to use your ActiveRecord objects in your worker classes. When you start the server it will use your already existing database.yml file for database connection details.

This plugin can also be used for caching large or compute-intensive objects including ActiveRecord objects. You can store rendered views or large queries in the cache. In fact you can store any text or object that can be marshalled. Here is how you would use the cache:

# Fill the cache
@posts = Post.find(:all, :include => :comments)
MiddleMan.cache_as(:post_cache, @posts)
# OR
@posts = MiddleMan.cache_as :post_cache do
Post.find(:all, :include => :comments)
end

# Retrieve the cache
@posts = MiddleMan.cache_get(:post_cache)
# OR
@posts = MiddleMan.cache_get(:post_cache) { Post.find(:all, :include => :comments) }

MiddleMan.cache_get takes an optional block argument. If the cache located at the :post_cache key is empty, the results of evaluating the block are placed in the cache and assigned to @posts. If you don't supply a block and the cache is empty it will return nil.

In the current implementation, you are responsible for expiring your own caches and deleting your own workers from the main pool. This works two ways. You can either explicitly call MiddleMan.delete_worker(:job_key) or MiddleMan.delete_cache(:cache_key). There is also a MiddleMan.gc! method that takes a Time object and deletes all jobs with a time-stamp older than the one specified. Here is a script that can be run from cron to expire jobs older than 30 minutes:

#!/usr/bin/env ruby
require "drb"
DRb.start_service
MiddleMan = DRbObject.new(nil, "druby://localhost:22222")
MiddleMan.gc!(Time.now - 60*30)

In the near future there will be a timing mechanism built into BackgrounDRb. This will allow for jobs and garbage collection to be run at scheduled times and for specifying a time-to-live parameter when you create new jobs or caches.

There are Rake tasks as well as plain Ruby command line scripts to start and stop the daemon. On OS X, linux or BSD you can use the Rake tasks to start and stop the server:

$ rake backgroundrb:start
$ rake backgroundrb:stop

On Windows you currently have to keep a console window open while you run the backgroundrb server (Hopefully this will change in the near future). So on Windows, to start the daemon you would open a console and run the command like this:

> ruby script\backgroundrb\start
# ctrl-break to stop

So what are a few real world use cases, you ask? Here is a small list of things I am currently using BackgrounDRb for:

  • Downloading and caching RSS feeds for a feed aggregator.
  • Screen scraping automation using watir to drive a web browser that navigates to other websites in the background to collect information.
  • Automating Xen VPS creation and sysadmin tasks.
  • Creating indexes in the background for Hyper Estraier and ferret search technologies.
  • Bridging Rails and IRC bots.

Plans for the future include the ability to fork new processes to handle larger jobs that require their own Ruby interpreter instance. Also work needs to be done to let BackgrounDRb run as a Windows service. Anyone who is familiar with Windows services that can offer some help here would be greatly appreciated. Suggestions and patches are also welcome.

  • rubyforge project
  • Blog
  • install as plugin: script/plugin install svn://rubyforge.org//var/svn/backgroundrb

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.

5 comments

Reply

Missing javascript includes in view by Oliver Kiessler Posted Jul 2, 2006 5:32 PM
drb and novarug by Tom Copeland Posted Jul 20, 2006 2:37 PM
Corrections, for new versions by Olle Jonsson Posted Dec 8, 2006 12:06 PM
installing problem by andrew d Posted Mar 29, 2007 1:06 AM
This Article is Obsolete by hemant kumar Posted Feb 11, 2008 12:50 PM
  1. Back to top

    Missing javascript includes in view

    Jul 2, 2006 5:32 PM by Oliver Kiessler

    In order to make your example work one needs to include the rails javascript libraries in the "start_background_task.rhtml" view: <%= javascript_include_tag :defaults %>

  2. Back to top

    drb and novarug

    Jul 20, 2006 2:37 PM by Tom Copeland

    Brian Sletten just did a drb presentation at the Northern VA Ruby User's Group; slides and whatnot are on novarug.org.

  3. Back to top

    Corrections, for new versions

    Dec 8, 2006 12:06 PM by Olle Jonsson

    Development has been swift... and some of the code in the article is now out-of-date.

    progress_percent = MiddleMan.get_worker(session[:job_key]).progress
    
    Should nowadays just be:
    progress_percent = MiddleMan.worker(session[:job_key]).progress
    

  4. Back to top

    installing problem

    Mar 29, 2007 1:06 AM by andrew d

    when installing and creating new workers, remember to restart your rails server. I was trying to run the example but it didn't work, once I restarted WEBrick everything worked

  5. Back to top

    This Article is Obsolete

    Feb 11, 2008 12:50 PM by hemant kumar

    Folks, Above Article on BackgrounDRb is completely obsolete. Please do not use instructions provided here as a tutorial for using BackgrounDRb. It has caused enough pain and agony for many users. New documentation for bdrb is available at, http://backgroundrb.rubyforge.org and one should look there before referring anything else.

Exclusive Content

Tapestry for Nonbelievers

A new article by I. Drobiazko and R. Zubairov introduces v. 5 of the Apache Tapestry component-oriented web framework. The tutorial shows how to create a component and covers IoC in Tapestry and Ajax.

Pete Lacey on REST and Web Services

In this interview, Burton Group consultant Pete Lacey talks to Stefan Tilkov about his disillusionment with SOAP, his opinion on REST, and addresses some of the perceived shortcomings REST vs. WS-*.

Business Natural Languages Development in Ruby

Jay Fields presents his concept of Business Natural Languages - a type of Domain Specific Languages geared towards being readable by domain experts.

Distributed Version Control Systems: A Not-So-Quick Guide Through

Adoption and interest for Distributed Version Control Systems is constantly rising. We will introduce the concept of DVCS and have a look at 3 actors in the area: git, Mercurial and Bazaar.

Segundo Velasquez and Agile as Seen Through the Customer's Eyes

Deborah Hartmann interviewed Segundo Velasquez about his experience as customer with an Agile team during the initial phase of software design of a product.

Fine Grained Versioning with ClickOnce

David Cooksey shows how to fine grained versioning to a ClickOnce deployment using an HttpHandler written with ASP.NET, making partial rollouts to a test audience much easier.

Implementing Manual Activities in Windows Workflow

Windows workflow (WF) is an excellent framework for implementing business processes, but lacks support for human activities. This article describes a completely generic approach for changing this.

Markus Voelter about Software Architecture Documentation

In this interview taken during OOPSLA 2007, Markus Voelter talks about the importance of documenting the software architecture, and gives some good and also bad examples on how it could be done.