Exploring Event Driven Architectures with Esper
Event driven application servers are a new category of servers, proving a runtime and supporting infrastructure services (transport, security, event journaling, high availability, connectors, etc.) to servers designed to be able to process over 100,000 events/sec. As well as event processing, event driven application servers are able to combine event information with long lived historical data (usually obtained via relational database queries) and performing temporal correlation and matching on the event streams.
There are two concepts that make event systems different from messaging system:
- Event stream processing (ESP) - monitors streams of event data, analyzing those events for matching conditions and then notifies listeners
- Complex event processing (CEP) - allows the detection of patterns among events
Fully featured application server are still a few years away, but developers can implement event driven architectures in stand alone applications, Java EE applications and Spring applications today using Esper from Codehaus. Esper version 1.0 (as reported by InfoQ) was released in June 2006, and is a lightweight, embeddable open source implementation of both ESP and CEP.
Integrating Esper into stand alone applications is easy. The steps are:
- Obtain an Esper engine instance
- Create a statement (using the Esper query language)
- Register the statement with the engine
- Create a listener (by implementing a Java interface that will be triggered when the statement evaluates to true) and attach it to the statement
Events can be represented as java objects, XML or maps, and as they flow through the system, statements will be evaluated and listener logic executed.
The Esper query language provides a rich syntax allowing complex temporal logic to be expressed, and includes features such as:
- Event filtering
- Sliding window and aggregation (count all assets reported in the last 30 seconds)
- Grouped windows and output rate limiting (get a count per zone of the last 10 minutes per zone)
- Joins and outer joins (also joins between event streams)
- Integration with historic or reference data (accessing relational databases)
- Creation of virtual streams that all statements can access
Even though event driven application servers are a few years away, Esper is ready for production use today. Integrating Esper into your applications is easy, and will allow you to provide features that will anticipate business and customers needs in real-time.
has anyone benchmarked esper?
by
peter lin
I should state that I work on rule engines and have been studying RETE for the last six years. Looking at esper's node classes and design, my guess is espers will likely run into scalability issues with 250K objects or more. The way joins are performed is going to lead to inefficient pattern matching, which will lead to huge CPU usage. A mature RETE engine can handle 1million facts without any problems. This is from first hand experience.
peter lin
Re: has anyone benchmarked esper?
by
Thomas Bernhardt
yep we have a benchmark available out of the RFID domain that features tracking 3000 assets as they move from zone to zone detecting when assets in a group split between zones. We have been able to get 110,000 events per second, sustained, on a laptop with DualCore 2.4 GHz. Details are on the JavaOne slides downloadable from our site.
We also have various microbenchmarks in different examples and regression tests that we run as part of our regular build that process large data sets. Note there is no industry standard benchmark at this time. We are planning to have a performance test suite available that can be used for capacity planning, hopefully soon.
Esper is designed to process large volumes of streaming data on a continuous basis. It indexes join fields and is therefore able to handle patterns involving joins rather well.
Cheers
Tom, of the Esper team
Re: has anyone benchmarked esper?
by
peter lin
peter
Re: has anyone benchmarked esper?
by
Alex Vasseur
I just want to stress out that Esper and ESP/CEP is about continuous streams and real time filtering, aggregation and pattern detection among event streams. It is not a rule engine and I don't think we can really draw a comparison there (and this is not what the industry seems to be at neither).
At that stage I believe most of the users are more interested by the kind of problem EDA and ESP/CEP enables them to solve rather than raw performance.
Peter, I thus assume all systems that have to deal with 500K "object dataset" (unclear what we are talking about there) require some optimization. That said when it comes to ESP/CEP like Esper there are 4 things you need to consider - and I 'd be please to hear your comments if the same are key performance evaluation criteria in a rule engine (which I again argue is a different beast for a different purpose):
- number of statements configured
- throughput of input events coming in for evaluation
- matching or output ratio (or filtering ratio if you want)
- and latency (usually in the order of a few ms or less and real-time JVM or pauseless GC start to help us a lot there)
I believe our RFID sample that Tom commented illustrates some of this and gave our audience figures to remember at the end of a 1 hour session (2000 statements, 100K event/s on utility laptop, less than 2% matching ratio). I'd be happy to report some more on performance if you want to submit a use case that you would like us to consider for a benchmark (we are also looking at growing the commiter base if you want to bring some of your knowledge on that!).
Did you wonder how (if at all) you could implement the RFID asset tracking example we have using something like Drools (RETE based rule engine, I don't know about its optimization)?
It might be you'll come up with another use case that can actually be solved by both a rule engine and a ESP/CEP engine, this is likely an overlap area (yes there are some) and there at the border line, you may have deuce or one solution may clearly outperform the other one as it will be just the right tool for the task. It 'd be nice to start such an exercice.
Alex (Esper team)
Re: has anyone benchmarked esper?
by
Thomas Bernhardt
-Tom (N)Esper rocks
Re: has anyone benchmarked esper?
by
peter lin
2000 X 60 seconds = 120,000 / minute
120,000 x 60 minutes = 7,200,000 / hour
Orders generaly will stay in a OMS system for several hours from the time an order is sent to the time the order is closed. The RFID example isn't all the interesting to me. I've worked on real-time pre-trade compliance systems have to have handle thousands of diversification rules. They include government regulations, aggregations, calculating risk (exposure) and rating. These are real-time with transactions which are about 3.5K messages. Given the constantly shifting data, calculating the risk of a mutual fund using price aggregates, ratings, and aggregate ratings is rather challenging. Each calculation is rather simple by itself. It's keeping up with the constant stream. an example rule in CLIPS format might look like this.
(defrule sector_wieght
(transaction
(ticker ?ticker)
(customer ?customerid)
(quantity ?quan)
(price ?price)
)
(security
(ticker ?ticker) ;; this joins on transaction.ticker
(sector ?sector)
(country ?co)
)
(customer
(id ?customerid) ;; this joins on transaction.customer
(portfolio ?pid)
)
?aggr <- (aggregate
(portfolio ?pid) ;; this joins on customer.portfolio
(sector ?sector) ;; this joins on security.sector
(country ?co) ;; this joins on security.country
)
(bind ?newWeight (calculate-weight ?aggr ?quan ?price) ) ;; this calculates the new weight and binds it to a variable named "newWeight"
(test (> ?newWeight 15%) )
=>
;; send out a compliance violation message to the OMS, which stops the transaction
)
If I wasn't so lazy, I'd translate it to SQL syntax, but hopefully the example provides some context.
I see quite a few CEP and ESP vendors tell people to use their system for algorithmic trading. Some try to make sound like it's new, but it's basically what OMS system have been doing for over a decade. Many of the existing system have been using RETE to do real-time event processing. It's just most people outside don't know about it. Then again, most of the firms using RETE to do EDA keep quiet, since they consider it a strategic advantage. The military has been using RETE to do event filtering for command control systems for over a decade. You can imagine how much data a military command control system handles per second with radar and satellite data streaming in 24/7.
peter
Re: has anyone benchmarked esper?
by
peter lin
peter
Re: has anyone benchmarked esper?
by
Thomas Bernhardt
-Tom
Re: has anyone benchmarked esper?
by
Thomas Bernhardt
-Tom
Re: has anyone benchmarked esper?
by
peter lin
if I have time later tonight I'll try to translate that rule to Sql. one could break that up into multiple sql queries or one with several subqueries.
peter
Re: has anyone benchmarked esper?
by
peter lin
select
cus.portfolio, sec.sector, sec.country into, trans.quantity, trans.price aggregate
from
transaction trans, security sec, customer cus, aggregate aggr
where
trans.ticker == sec.ticker and cus.id == trans.customer
select
trans.*
from
aggregate aggr, transaction trans, customer cus
where
trans.customer == cus.id and trans.ticker == sec.ticker and
sec.sector == aggr.sector and cus.portfolio == aggr.portfolio and
sec.country == aggr.country and aggr.newWeight > 15%
This kind of rule is actually fairly simple in the pre-trade compliance world. Other rules are more complex and calculate risk across the firm or a large mutual fund.
peter
Question about Esper
by
peter lin
peter
Re: Question about Esper
by
Alex Vasseur
Alex
Re: Question about Esper
by
peter lin
The sliding window then defines a condition which says "if x condition happens at a max/min of x time, then do something". The kind of real-time processes I've worked with are OMS related. This means there is no max/min sliding window.
There are thousands of transactions in the system and all of them have a different expiration time. In a system like an OMS, it can't do win:time(30 sec) because that doesn't make any sense. A sell order might say mininum price of xx.xx dollars and x shares. If any buy matchs that price and shares, it should go through immediately. Waiting for 30 seconds could mean someone else fill that order.
The more I look at CEP/ESP, the less useful it becomes. When i compare RETE to Esper, I'm only looking at the compilation of the query. RETE provides one of the most efficient ways of compiling a query into an optimized query plan. Those who say RETE is not a good fit for EDA either A) have never bothered to study RETE or B) have misconceptions about what RETE is.
I see alot of people saying "RETE is wrong for EDA, CEP, ESP" and go on to show a sql like query to prove their point. The first part of RETE is compiling a statement into an optimal query plan. The second part is the runtime indexing. One can apply RETE compilation and forget the runtime, or adapt the algorithm for temporal execution. RETE compilation has nothing to do with whether "time is a first class citizen". It is just an efficient method for compiling statements into optimal relational queries.
Just because some rule engines don't support temporal logic, does not mean RETE is wrong or inappropriate as some commercial vendors are saying. To my knowledge, Esper has never made such a claim, but I have seen some commercial CEP vendors make those kinds of statements. For those who forget, temporal logic is one of the areas that AI and expert systems have pushed. Many of the advances in temporal logic came from AI research.
i'm curious, how does a CEP system handle removal of events. in rule engine terms, when an event is retracted. I've tried to read up on CEP/ESP the last few days and the literal seems rather thin compared to the mountain of literal on pattern matching and rule engines.
my bias 2 cents
peter
Re: Question about Esper
by
Alex Vasseur
I'd be again happy to consider a use case with real working code. If I ask an ESP/CEP engine to handle the OMS side of thing, might be a rule engine will do a better job. If I ask a rule engine to detect a tripple bottom pattern on a stock tick, might be an ESP/CEP engine will do a better job. It could be the RFID sample we have in our Esper JavaOne slides can be solved both ways. If one want to give it a try with his favorite rule engine please do. The statements are in the slides and all the running code + demo GUI is in our SVN.
By the way, it happens a very similar thread on RETE 'vs/for' CEP/ESP was started here. Could you confirm you posted there as well on May 7 - ie Peter == woolfel? The posts from this pseudo look very similar if not the same...
Alex
Re: Question about Esper
by
peter lin
I've built pre-trade compliance systems using JESS, so I have a little bit of experience building real-time compliance for OMS. Most diversification rules require the system incrementally recalculate the aggregate as transactions come ine. The aggregates are basically multi-dimensional aggregates and vary between 12-20 dimensions.
peter
Re: Question about Esper
by
peter lin
"On the if Rete fits CEP/ESP I think making time and causality a first class citizen is likely to deeply impact any Rete algorithm implementation. Let's left aside clustering and near real time performance requirements, or joins to relational database and continous joins. There has been extensive research in the CEP/ESP field and also around Rete and I'd tend to argue if researchers haven't come up to a common implementation so far, this is likely because some walls were hit.
Both play a key role in an EDA but I don't believe a one size fits all there."
I'll be blunt here. Very few people understand RETE well enough to implement a high performance rule engine. There's maybe 2 dozen people who know RETE well enough to implement a high performance engine. Dr. Forgy, Gary Riley, Ernest Friedman Hill, a few researchers at iLog, paul haley and a few guys who worked at ART.
the statement about enhancing RETE so that "time is a first class citizen" is untrue. In my blogged, I provide several detailed description of how one can enhance RETE to support temporal logic. The type of temporal logic used in business system is only a tiny subset of temporal logic used in AI. There isn't a consensus from AI researchers about the best way to handle temporal logic because the AI case is 100x harder to handle than the simple business cases. I have blogs that attempt to describe temporal logic and how it differs between AI and business rules.
If you want an invite to my blog, email me at Woolfel AT gamil DOT com.
peter
Re: Question about Esper
by
d taye
How does esper stack up against what's outlined in "Coral8 Guide To Evaluating ESP Engines"? ?
thanks
Re: Question about Esper
by
Alex Vasseur
The fact that you don't point me to a real implementation or research papers but to some posts in a blog just makes my point: I would tend to argue convergence as not happened yet.
In all case we are pretty happy with Esper performance thus far and so are our users, and if at some time we get smart enough to understand RETE (i.e find time to study it properly - which I haven't so far) and feel it 'd bring something to our users we'll certainly work on changing our underlying implementation. We'd welcome contribution on that if you are interested.
Alex
Re: Question about Esper
by
Alex Vasseur
- first it is not neutral. Coral8 is one of the many vendors in the ESP/CEP space.
- second there is no answers from Coral8 itself.
- third a number of questions that are irrelevant to Esper as it is designed to be embedded (how easy is the engine to install) and runs on Java and .Net (are 64bit processors supported)
Which makes me suggest that if you are conducting an RFI on ESP/CEP engines I'd be please to help to evaluate Esper in your own context.
A fundamental difference though is that Esper is open source.
Alex
Re: Question about Esper
by
peter lin
If you want a real world application, sadly I can't provide any. I do know of systems using RETE, but that code belongs those companies. Like I said in a previous comment, I do plan to build a "real world" compliance scenario for real-time trading systems, but I haven't finished it. More accurately, I don't have enough free time to implement a full application.
I do know some firms are experimenting with ESP/CEP products to do hedge fund stuff. Actually, many firms have been building these types of systems since mid 90's. They aren't general purpose solutions.
There's no point in re-implementing RETE, when you can use JBossRules for the pattern matching. Not only do you get an efficient RETE implementation, you get support for many first order logic concepts like existential, negation, forall and collect. I haven't read the spec for StreamSql, but from the examples I've seen so far, i don't think it is expressive enough to support FOL.
Do you know if StreamSql supports existential, negation, forall and collection?
peter
Re: Question about Esper
by
d taye
Also, updating the license to something more friendly would be helpful.
So here's the question again, how does it stack up against the coral8 guide and Stonebraker's rules ?
Re: Question about Esper
by
Thomas Bernhardt
In terms of the Stonebraker rules, here is some short answers
(a) Keep the Data Moving
Esper has no costly storage operation in the critical path, our tests results show extremely low latency
(b) Query using SQL on Streams
The query language implements SQL and providing extensions for event stream processing and pattern matching/CEP
(c) Handle Stream Imperfections (Delayed, Missing and Out-of-Order Data)
Some of the features that Esper provides particularly to deal with these issues are joins, outer joins, patterns, subqueries and data windows.
(d) Generate Predictable Outcomes
Yep we have worked hard to get deterministic and predictable processing under multi-threaded conditions
(e) Integrate Stored and Streaming Data
Esper allow SQL queries to be placed right within the query language and provides expiry-time or LRU caches.
(f) Guarantee Data Safety and Availability
Esper does not currently offer a persistance mechanism for events. We are working on a HA feature set.
(g) Partition and Scale Applications Automatically, and
We are working towards these goals in the HA feature set
(h) Process and Respond Instantaneously.
We achieve that through highly optimized filtering, query planning, indexing and execution and other optimizations.
Re: Question about Esper - scalability across multiple JVMs
by
Mahesh Venkat
Is it possible to elaborate on scalability of Esper across multiple jvms running in the same or different physical machines?
If two Esper engines are running in two JVMs as part of existing applications how do you ensure that only one Esper engine picks the event? Also how do you load balance events among multiple Esper engines that are subscribed to the same events?
A typical example would to download an large document.
If I want to build subscribers that listen to the completion of the downloaded documents event and act upon this event, how do you ensure the events are load balanced across multiple Espers running on multiple JVMs?
Re: has anyone benchmarked esper?
by
som sengupta
Educational Content
Building Hypermedia APIs with HTML
Jon Moore Jun 19, 2013
Deleting Code at Nokia
Tom Coupland Jun 19, 2013
Intro to CLP with core.logic
Ryan Senior Jun 18, 2013
Spock: A Highly Logical Way To Test
Howard Lewis Ship Jun 18, 2013
Java Garbage Collection Distilled
Martin Thompson Jun 17, 2013




Hello stranger!
You need to Register an InfoQ account or Login to post comments. But there's so much more behind being registered.Get the most out of the InfoQ experience.
Tell us what you think