Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News ActiveWarehouse, a New Step for Enterprise Ruby

ActiveWarehouse, a New Step for Enterprise Ruby

Ruby and Rails continue to generate a lot of buzz in the overall software development community, but debate rages over how it will accomodate enterprise needs. For instance, is Rails able to handle large amounts of business data? With the new release of ActiveWarehouse, open-source Rails programmer Anthony Eden delivers a plugin that makes it easier to effectively build data warehouses using Ruby on Rails. Right now, ActiveWarehouse is one of the most active RubyForge projects, and progress along the features roadmap looks impressive.

The ActiveWarehouse plugin simplifies building data warehouses in Rails. A data warehouse is a database designed specifically for analytical use as opposed to operational transaction processing. Typically a data warehouse houses data which spans several years and is sourced from numerous operational databases. Data warehouses are usually highly de-normalized which is contrary to transactional systems which tend to be normalized. The links in the side bar provide additional information.

In this release you'll find notably : generators for facts, dimensions and cubes; Multi-dimension support; Automatic creation of aggregates and many other features, with a full pipeline of planned enhancements on the way.

To get the data from multiple data sources into the data warehouse, ActiveWarehouse is paired with the ActiveWarehouse ETL component.

The ETL handles most of the basic source types which would be used when integrating a fairly current system (delimited, fixed-width, XML and database sources). It can also be extended with custom parsers, so that part is handled at the moment. There are also enough transforms to be useful and adding new ones is easy. The system is definitely extensible.

Other functionalities also available are notably : Virtual source fields; Support for pre and post processing code; ETL Domain Specific Language (DSL) control files. Bulk loading is available only for MySQL at the moment. Anthony is also working on performance issues, which are always a crucial in this domain.

ActiveWarehouse and ETL component abilities are demonstrated through a comprehensive tutorial.

Rate this Article