InfoQ

News

ActiveWarehouse, a New Step for Enterprise Ruby

Posted by Sebastien Auvray on Mar 28, 2007 12:00 PM

Community
Ruby
Topics
Ruby on Rails,
Data Warehousing
Tags
Data Warehouse,
Rails Plugins,
ETL

Ruby and Rails continue to generate a lot of buzz in the overall software development community, but debate rages over how it will accomodate enterprise needs. For instance, is Rails able to handle large amounts of business data? With the new release of ActiveWarehouse, open-source Rails programmer Anthony Eden delivers a plugin that makes it easier to effectively build data warehouses using Ruby on Rails. Right now, ActiveWarehouse is one of the most active RubyForge projects, and progress along the features roadmap looks impressive.

The ActiveWarehouse plugin simplifies building data warehouses in Rails. A data warehouse is a database designed specifically for analytical use as opposed to operational transaction processing. Typically a data warehouse houses data which spans several years and is sourced from numerous operational databases. Data warehouses are usually highly de-normalized which is contrary to transactional systems which tend to be normalized. The links in the side bar provide additional information.

In this release you'll find notably : generators for facts, dimensions and cubes; Multi-dimension support; Automatic creation of aggregates and many other features, with a full pipeline of planned enhancements on the way.

To get the data from multiple data sources into the data warehouse, ActiveWarehouse is paired with the ActiveWarehouse ETL component.

The ETL handles most of the basic source types which would be used when integrating a fairly current system (delimited, fixed-width, XML and database sources). It can also be extended with custom parsers, so that part is handled at the moment. There are also enough transforms to be useful and adding new ones is easy. The system is definitely extensible.

Other functionalities also available are notably : Virtual source fields; Support for pre and post processing code; ETL Domain Specific Language (DSL) control files. Bulk loading is available only for MySQL at the moment. Anthony is also working on performance issues, which are always a crucial in this domain.

ActiveWarehouse and ETL component abilities are demonstrated through a comprehensive tutorial.

4 comments

Reply

Looks Good by Frank Bolander Posted Mar 28, 2007 1:27 PM
Re: Looks Good by Anthony Eden Posted Mar 29, 2007 1:32 PM
Enterprise by Steven Devijver Posted Mar 30, 2007 9:50 AM
Re: Enterprise by Anthony Eden Posted Jun 16, 2007 7:29 AM
  1. Back to top

    Looks Good

    Mar 28, 2007 1:27 PM by Frank Bolander

    Good job Anthony. The ETL stuff looks very interesting. Any plans on supporting MDX?

  2. Back to top

    Re: Looks Good

    Mar 29, 2007 1:32 PM by Anthony Eden

    No plans for MDX at the moment. ActiveWarehouse isn't a middleware app at the moment, rather it is designed to be used directly from Rails. That's not to say that it couldn't become a standalone middleware app in the future, it's just not something I or the other authors need at the moment. Feel free to jump in and scratch any itches that you have, though. :-)

  3. Back to top

    Enterprise

    Mar 30, 2007 9:50 AM by Steven Devijver

    If data warehouse code is considered "enterprise" then many of the scripts I wrote in my Perl days are "enterprise" too. Whatever your connotation of "enterprise" is, what it boils down to is how much your VM can take under load. The JVM can, Erlang too and there are others. The Ruby VM however is lagging many years and millions of dollar/euro behind. Don't take my word for it, ask the people that work on it.

  4. Back to top

    Re: Enterprise

    Jun 16, 2007 7:29 AM by Anthony Eden

    Steven, You are right, the performance of the Ruby interpreter *is* an issue in certain parts of ActiveWarehouse, specifically data aggregation for large data sets. I remember the same was true back in the early days of Java as well, so I have confidence that the performance of Ruby will continue to improve. One day we will have both performance and joy of development at the same time. ;-)

Exclusive Content

10 Ways to Screw Up with Scrum and XP

Henrik Kniberg talks about 10 possible reasons to fail while doing Scrum and XP. Maybe the team does not have a definition of what Done means to them, or they don't know what their velocity is.

Tips from a Top Sports Team Coach

This article outlines 9 principles Marc Lammers discovered while building the world’s best field hockey team, mapping them to software development practices.

SOA Governance: An Enterprise View

Michael Poulin explains the necessity for SOA governance to ensure an Enterprise SOA's success, relying on concepts from the OASIS SOA Reference Model and Reference Architecture.

Developing Portlets using JSF, Ajax, and Seam (Part 2 of 3)

This article covers setting up a RichFaces portlet using JBoss Portlet Container and JBoss Portlet Bridge, deploying a RichFaces portlet, and RichFaces capabilities.

Scalability Worst Practices

This article discusses scalability worst pratices including The Golden Hammer, Resource Abuse, Big Ball of Mud, Dependency Management, Timeouts, Hero Pattern, Not Automating, and Monitoring.

Do the Hustle

Obie Fernandez shares his experience selling consulting services for both Thoughtworks and Hashrocket and give tips how Ruby developers can work with clients.

Natural Laws of Software Development - Deriving Agile Practices

Jeffries and Hendrickson derive Agile practices from the natural laws of software development. They don't just say "Be Agile!", but they explain why Agile practices make perfect sense.

Jinesh Varia About Amazon Alexa Web Service's Architecture

Jinesh Varia talks about the architecture of one of Amazon's web services called Alexa. Jinesh explains how Amazon has reached scalability, performance and reduced costs for the Alexa service.