InfoQ

Article

Introduction to BackgrounDRb

Posted by Ezra Zygmuntowicz on Jun 27, 2006 01:59 PM

Community
Ruby
Topics
Ruby on Rails ,
Programming
Tags
Rails Plugins ,
DRb ,
Service Design

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.


Ruby on Rails is a great framework for developing many diverse types of web applications. As the problem domain of these web applications expands, you may need to run computationally intensive or long running background tasks. This poses a problem in that you are constrained to work within the request/response cycle of HTTP. So how can you run these long background tasks without your web server timing out? And how do you display the progress to your users?

Enter BackgrounDRb. This is a Rails plugin I wrote recently as one way to solve this problem. Ruby includes DRb (Distributed Ruby) as part of the standard library. DRb provides a simple API for publishing and consuming Ruby objects over TCP/IP networks or unix domain sockets. BackgrounDRb is a small framework that facilitates running background tasks in a separate process from Rails, thereby decoupling them from the request/response cycle. With DRb you can manage your tasks from Rails using hooks for progress bars or status updates to your users.

The BackgrounDRb server works by publishing a MiddleMan object. This object is the manager for your worker classes. It holds a @jobs hash composed of { job_key => running_worker_object } pairs and a @timestamps hash composed of { job_key => timestamp } pairs. The MiddleMan object straddles the interface between the DRb server and your Rails application. Here is a simple diagram to show the architecture.

This is a generic worker class as created by the worker generator provided by the plugin.

$ script/generate worker Foo
class FooWorker < BackgrounDRb::Rails
def do_work(args)
# This method is called in its own new thread when you
# call new worker. args is set to :args
end

end

When your FooWorker object is instantiated from rails via MiddleMan, the do_work method is automatically run in its own thread. We use a thread here so rails does not wait for the do_work method to finish before it continues on.

With BackgrounDRb, you usually create a new worker object with an AJAX request. Your view can then use periodically_call_remote to fetch the progress of your job and display it however you like. Let's flesh out the FooWorker class and show how you would create a new FooWorker object and retrieve its progress from within a rails controller.

class FooWorker < BackgrounDRb::Rails
attr_reader :progress
def do_work(args)
@progress = 0
calculate_the_meaning_of_life(args)
end
def calculate_the_meaning_of_life(args)
while @progress < 100
# calculations here
@progress += 1
end
end
end

Now in the controller:

class MyController < ApplicationController
def start_background_task
session[:job_key] =
MiddleMan.new_worker(:class => :foo_worker,
:args => "Arguments used to instantiate a new FooWorker object")
end
def get_progress
if request.xhr?
progress_percent = MiddleMan.get_worker(session[:job_key]).progress
render :update do |page|
page.call('progressPercent', 'progressbar', progress_percent)
page.redirect_to( :action => 'done') if progress_percent >= 100
end
else
redirect_to :action => 'index'
end
end
def done
render :text => "

Your FooWorker task has completed

"
MiddleMan.delete_worker(session[:job_key])
end
end

And in your start_background_task.rhtml view file you could use something like this:









 

<%= periodically_call_remote(:url => {:action =>
'get_progress'}, :frequency => 1) %>

MiddleMan.new_worker returns a randomly generated job_key that you can store in the session for later retrieval. If you want to specify a named key instead of using the generated key you can do so like this:

 # This will throw a BackgrounDRbDuplicateKeyError if the :job_key already exists.
MiddleMan.new_worker(:class => :foo_worker,
:job_key => :my_worker,
:args => "Arguments used to instantiate a new FooWorker object")

MiddleMan.get_worker :my_worker

Upon instalation, the plugin writes a config file into RAILS_ROOT/config/backgroundrb.yml. In this file there is a load_rails config option. If this is set to true then you will be able to use your ActiveRecord objects in your worker classes. When you start the server it will use your already existing database.yml file for database connection details.

This plugin can also be used for caching large or compute-intensive objects including ActiveRecord objects. You can store rendered views or large queries in the cache. In fact you can store any text or object that can be marshalled. Here is how you would use the cache:

# Fill the cache
@posts = Post.find(:all, :include => :comments)
MiddleMan.cache_as(:post_cache, @posts)
# OR
@posts = MiddleMan.cache_as :post_cache do
Post.find(:all, :include => :comments)
end

# Retrieve the cache
@posts = MiddleMan.cache_get(:post_cache)
# OR
@posts = MiddleMan.cache_get(:post_cache) { Post.find(:all, :include => :comments) }

MiddleMan.cache_get takes an optional block argument. If the cache located at the :post_cache key is empty, the results of evaluating the block are placed in the cache and assigned to @posts. If you don't supply a block and the cache is empty it will return nil.

In the current implementation, you are responsible for expiring your own caches and deleting your own workers from the main pool. This works two ways. You can either explicitly call MiddleMan.delete_worker(:job_key) or MiddleMan.delete_cache(:cache_key). There is also a MiddleMan.gc! method that takes a Time object and deletes all jobs with a time-stamp older than the one specified. Here is a script that can be run from cron to expire jobs older than 30 minutes:

#!/usr/bin/env ruby
require "drb"
DRb.start_service
MiddleMan = DRbObject.new(nil, "druby://localhost:22222")
MiddleMan.gc!(Time.now - 60*30)

In the near future there will be a timing mechanism built into BackgrounDRb. This will allow for jobs and garbage collection to be run at scheduled times and for specifying a time-to-live parameter when you create new jobs or caches.

There are Rake tasks as well as plain Ruby command line scripts to start and stop the daemon. On OS X, linux or BSD you can use the Rake tasks to start and stop the server:

$ rake backgroundrb:start
$ rake backgroundrb:stop

On Windows you currently have to keep a console window open while you run the backgroundrb server (Hopefully this will change in the near future). So on Windows, to start the daemon you would open a console and run the command like this:

> ruby script\backgroundrb\start
# ctrl-break to stop

So what are a few real world use cases, you ask? Here is a small list of things I am currently using BackgrounDRb for:

  • Downloading and caching RSS feeds for a feed aggregator.
  • Screen scraping automation using watir to drive a web browser that navigates to other websites in the background to collect information.
  • Automating Xen VPS creation and sysadmin tasks.
  • Creating indexes in the background for Hyper Estraier and ferret search technologies.
  • Bridging Rails and IRC bots.

Plans for the future include the ability to fork new processes to handle larger jobs that require their own Ruby interpreter instance. Also work needs to be done to let BackgrounDRb run as a Windows service. Anyone who is familiar with Windows services that can offer some help here would be greatly appreciated. Suggestions and patches are also welcome.

  • rubyforge project
  • Blog
  • install as plugin: script/plugin install svn://rubyforge.org//var/svn/backgroundrb

Update 3rd March 2008: NOTE:  this article is outdated - please refer to the documentation on the BackgrounDrb website.

Missing javascript includes in view by Oliver Kiessler Posted Jul 2, 2006 5:32 PM
drb and novarug by Tom Copeland Posted Jul 20, 2006 2:37 PM
Corrections, for new versions by Olle Jonsson Posted Dec 8, 2006 12:06 PM
installing problem by andrew d Posted Mar 29, 2007 1:06 AM
This Article is Obsolete by hemant kumar Posted Feb 11, 2008 12:50 PM
  1. Back to top

    Missing javascript includes in view

    Jul 2, 2006 5:32 PM by Oliver Kiessler

    In order to make your example work one needs to include the rails javascript libraries in the "start_background_task.rhtml" view: <%= javascript_include_tag :defaults %>

  2. Back to top

    drb and novarug

    Jul 20, 2006 2:37 PM by Tom Copeland

    Brian Sletten just did a drb presentation at the Northern VA Ruby User's Group; slides and whatnot are on novarug.org.

  3. Back to top

    Corrections, for new versions

    Dec 8, 2006 12:06 PM by Olle Jonsson

    Development has been swift... and some of the code in the article is now out-of-date.

    progress_percent = MiddleMan.get_worker(session[:job_key]).progress
    
    Should nowadays just be:
    progress_percent = MiddleMan.worker(session[:job_key]).progress
    

  4. Back to top

    installing problem

    Mar 29, 2007 1:06 AM by andrew d

    when installing and creating new workers, remember to restart your rails server. I was trying to run the example but it didn't work, once I restarted WEBrick everything worked

  5. Back to top

    This Article is Obsolete

    Feb 11, 2008 12:50 PM by hemant kumar

    Folks, Above Article on BackgrounDRb is completely obsolete. Please do not use instructions provided here as a tutorial for using BackgrounDRb. It has caused enough pain and agony for many users. New documentation for bdrb is available at, http://backgroundrb.rubyforge.org and one should look there before referring anything else.

Educational Content

Bindings, Platforms, and Innovation

This presentation focuses on the Internet and separating myth from fact, history from the future, and the mundane from the imaginative. Bob Frankston presents a vision of what could and should be.

Orchestrating Long Running Activities with JBoss / JBPM

This article explores the use of JBoss and jBPM to implement design solutions that effectively address the issue of orchestrating long running activities.

Neo4j - The Benefits of Graph Databases

This presentation covers the use of graph databases as an optimal solution for data that is difficult to fit in static tables, rapidly evolving data or data that has a lot of optional attributes.

Realistic about Risk: Software development with Real Options

This session introduces Real Options and shows how it can help in running your project. Real Options is a decision-making process that can be used to manage risk.

Communication Flexibility Using Bindings

This article discusses the use of bindings on services and references (including the instance of non-configured bindings) as the means to implement SCA communications in a Web and SOA environment.

Writing DSLs in Groovy

After a short introduction to DSLs, Scott Davis plays with the keyboard showing how to approach the creation of a DSL by typing working snippets of Groovy code that get executed.

Scaling Agile with C/ALM (Collaborative Application Lifecycle Management)

IBM Rational and InfoQ present, Scaling Agile with C/ALM, an eBook showing organizations how to become “finely tuned software delivery machines” by enabling team integration and scaling.

Concurrent Programming with Microsoft F#

Amanda Laucher presents a real life enterprise application written in F#. She shows actual code snippets, explaining design decisions and suggesting how to use some of the F# constructs.