InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Event Stream Processing: Scalable Alternative to Data Warehouses?

Posted by Sadek Drobi on Oct 31, 2008

Sections
Architecture & Design,
Development,
Enterprise Architecture
Topics
Enterprise Architecture ,
Architecture ,
Data Warehousing ,
Events
Tags
Event Stream Processing ,
Data Warehouse ,
Scalability

On his blog, Dan Pritchett suggests an alternative solution to data warehousing applications. Although reluctant about “solutions that can only be implemented in a single address and storage space”, he acknowledges that sometimes data needs to be aggregated in order to be analyzed. This is precisely what data warehousing applications do offering the possibility to aggregate information along a variety of axis and to invert relationships in the data. Their usage, however, has significant downsides according to Pritchett. Not only are data warehousing applications expensive and “often out of the reach of smaller organizations”, but the way Extract, Transform and Load software (ETL) functions induces costs in terms of scalability and reactivity:

First, the ETL places a significant load on your production databases. If your business has nice offline windows for the ETL, that's great, but if not, managing the scale becomes a challenge. Second, the freshness of the warehouse is typically 24 hours behind or more. As your business grows this lag will grow as well.

Dan Pritchett believes that there could be a solution that would be less expensive and more scalable: processing streams of events using an Event Stream Processor (ESP) solution.

ESP analyze streams of events using a language similar to SQL. In the same manner that databases and data warehouses use SQL to perform analysis of data tables, ESP use their query language to analyze streams of events. The simplest way to understand ESP is to think of events as rows in a table and the attributes of an event as the columns. Each event type is the equivalent of a table.

[…]

[ESP analyzes] the changes to your data as it occurs. Rather than doing batch ETL's, you stream business events as the state of your data changes. This creates a more manageable scaling model for your production system.

[…]

ESP can also be horizontally scaled, providing a more cost effective solution for your business. And since ESP is performing the analysis in real time, the business metrics can be current and remain that way as the business grows.

Dan highlights however that this approach does not allow performing historical analysis in order to get on the business activity a perspective that is different from the one considered at real time. A solution Pritchett mentions could be a framework for capturing and replaying transactions, which would however be rather costly. Commenting on the post, Tahir Akhtar suggests another possible solution: replacing ETL by ESP but continue using data warehousing applications in order to preserve the ability to do historical analysis while taking advantage of ESP scalability and reactivity.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.