BT

Making Sense of Event Stream Processing

| by Jan Stenberg Follow 38 Followers on Mar 22, 2015. Estimated reading time: 2 minutes |

Structuring data as a stream of events is an idea appearing in many areas but unfortunately sometimes using different terminology Martin Kleppmann explains when describing the fundamental ideas behind Stream Processing, Event Sourcing and Complex Event Processing (CEP).

Kleppmann, author of the upcoming book Designing Data-Intensive Applications, claims that a lot of the ideas are quite simple and worth learning about, they can help us build applications more scalable, reliable and maintainable.

Kleppmann takes Google Analytics, a tool to keep track on pages viewed by visitors to a website, as an example of using events. With this tool every page view results in an event containing e.g. the URL of the page, a timestamp, and client IP address, which may result in a huge amount of events for a popular website. To collocate usage of the website from these events there are two options, both useful but in different scenarios:

  • Store all events in some type of data store and then use a query language to group by URL, time period and so on, essentially doing the aggregation when requested. An advantage using this technique is the ability to use new calculations on old data.
  • Aggregate on URL, time, etc. as the events arrives without storing the events, e.g. using an OLAP cube. One advantage this gives is the ability to make decisions in real time, for instance limiting the number of requests from a specific client.

Event sourcing is a similar idea coming from the Domain-Driven Design (DDD) community. A common example here is a shopping cart used in an e-commerce website. Instead of changing and storing current state of the cart, every event changing the state of the cart is stored. Examples of events are ItemAdded and ItemQuantityChanged. Replaying the events, or aggregating over them, will recreate the cart to its current state and Kleppmann notes that this is very similar to the Google Analytics example.

For Kleppmann using events is the ideal way of storing data, all information is saved by appending it as a single blob, removing the need to update several tables. He also sees aggregating data as an ideal way to read data from a data store since current state is normally what is interesting for a user. Comparing with a user interface, a user clicking a button corresponds to an event, and requesting a page corresponds to an aggregation to retrieve the current state. Kleppmann derives a pattern from his example; input raw events are immutable facts, easy to store and a source of truth. Aggregates are derived from these raw events, cached and updated when new events arrive. If needed all events can be replayed recreating all aggregates.

Moving to an event-sourcing-like approach is a move away from the traditional way databases are used, storing current state. Kleppmann’s reasons for still doing this includes:

  • Loose coupling, achieved by the separate schemas used for writing and reading.
  • Increased performance since separate schemas enables independent optimization of reads and writes and may also avoid (de)normalization discussions.
  • Flexibility from the possibility of trying a new algorithm for creating aggregates which then can either be discarded or replacing the old algorithm.
  • Error scenarios are easier to handle and reason about due to the possibility of replaying events leading to a failure.

Actor frameworks, e.g. Akka, Orleans and Erlang OTP are based on streams of immutable events but Kleppmann notes that they primarily are a mechanism dealing with concurrency, less for data management.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

events only useful if the experience can be recreated through simulation by William Louth

www.autoletics.com/posts/beyond-metrics-logging...

Beyond Metrics and Logging with Metered Software Memories
A proposal for a different approach to application performance monitoring that is far more efficient, effective, extensible and eventual than traditional legacy approaches based on metrics and event logging. Instead of seeing logging and metrics as primary datasources for monitoring solutions we should instead see them as a form of human inquiry over some software execution behavior that is happening or has happened. With this is mind it becomes clear that logging and metrics do not serve as a complete, contextual and comprehensive representation of software execution behavior.

Software memories allow us to employ multiple techniques of discovery and they are not limited to what we know today and what tools and techniques are available to us at this time. If software machine (behavioral) memories can always be simulated with the ability to augment each playback of a simulation then there are no limits to what questions can be asked of the past in the present and future.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss
BT