Search code examples
time-seriesdroolscomplex-event-processingesper

How do CEP rules engines store time data?


I'm thinking about designing an event processing system. The rules per se are not the problem. What bogs my is how to store event data so that I can efficiently answer questions/facts like:

If number of events of type A in the last 10 minutes equals N, and the average events of type B per minute over the last M hours is Z, and the current running average of another metric is Y... then fire some event (or store a new fact/event).

How do Esper/Drools/MS StreamInsight store their time dependant data so that they can efficiently calculate event stream properties? ¿Do they just store it in SQL databases and continuosly query them?

Do the preprocess the rules so they can know beforehand what "knowledge" they need to store?

Thanks

EDIT: I found what I want is called Event Stream Processing, and the wikipedia example shows what I would like to do:

WHEN Person.Gender EQUALS "man" AND Person.Clothes EQUALS "tuxedo"
FOLLOWED-BY
  Person.Clothes EQUALS "gown" AND
  (Church_Bell OR Rice_Flying)
WITHIN 2 hours
ACTION Wedding

Still the question remains: how do you implement such a data store? The key is "WITHIN 2 hours" and the ability to process thousands of events per second.


Solution

  • Esper analyzes the rule and only stores derived state (aggregations etc., if any) and if needed by the rule also a subset of events. Esper allows defining contexts like described in the book by Opher Etzion and Peter Niblet. I recommend reading. By specifying a context Esper can minimize the amount of state it retains and can make queries easier to read.