Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Implementing Agile in Data Warehouse Projects

Implementing Agile in Data Warehouse Projects

Warren Sifre, BI Solution Architect at Allegient has shared his views on the implementation of agile in the data warehouse projects via his blog. He says that for this to work a team of the right composition and mindset is required.

According to Warren, a data warehouse project has the following development tasks –

  • Data Profiling and Data Modeling
  • ETL Development and Unit Testing
  • Semantic Layer Development and QA Testing
  • Report Builder Development -

These tasks have the natural dependency between them and while considering the implementation of agile following questions should be considered –

  • How can you develop reports without a complete data model?
  • How can you develop ETL without a complete data model?
  • How can you develop a Data Model without having the complete list of use cases?

For agile to work, there are a few concessions that need to be made prior to beginning. Warren suggests that:

  • The Data Model will never be final.
  • Full historical data loads will occur multiple times throughout the project or phase.
  • Changes in requirements will cause potential changes to everything.
  • Automated test development must be included in the project or phase.

He mentions that the key to implement agile with these types of dependencies is coordination and breaking the work up into phases. Each phase should include a high-level overview of the goal for each phase. The first phase should be broken up into user stories with acceptance criteria and the dependency identification. The user stories should also include the level of effort to deploy to UAT/PROD environments. A combination of both the dependencies and the effort provide a logical priority order to the phase’s backlog.

Mishkin Berteig, co-founder and president of Berteig Consulting Inc. shares his experience of Scrum implementation in a data warehouse project, in his recent blog. He coached a team which was working on a data warehouse migration from Oracle to Teradata and the organization had about 30 people allocated to the project. Before adopting Scrum, the team had done a bunch of up-front analysis work and resulted in a dependency map among approximately 25,000 tables, views and ETL scripts. The dependency map was stored in an MS Access DB.

Mishkin proposed migration based on values instead of following dependency map.

I decided to challenge the assumption about working based on dependencies.  I spoke with the Product Owner about the possible ways to order the work based on value.  We spoke about a few factors including:

  • retiring Oracle data warehouse licenses / servers,
  • retiring disk space / hardware,
  • and saving CPU time with new hardware

Mishkin with the Product Owner ordered the MS Access DB by business value.  This involved a fairly simple calculation based primarily on the disk space and the CPU time associated with each item in the DB.  This database of 25,000 items became the Product Backlog.

Mishkin asked the teams to treat the data warehouse tables as the PBIs and have both Oracle and Teradata running simultaneously (in production) with updates every Sprint for migrating data between the two platforms. The project got significant benefits of this technique.

A couple Sprints later, the first of 5 Oracle licenses was retired, and the 2-year $20M project was a success, with nearly every Sprint going into production and with Oracle and Teradata running simultaneously until the last Oracle license was retired…..The savings were huge due to the early delivery of value.

Rate this Article