Search code examples
complex-event-processingesper

Overriding time used for time windows in Esper


I am working on a CEP project where I analyze logs from a file in bulk. The file is a compressed csv file that is bulk transferred over to my analytics machine every hour, where each line contains an event with a timestamp for exactly when it happened during that previous hour.

Reading this file into a plain Java object is no problem and I will typically end up with something like this:

class MyEvent {
    public Date getTimestamp();
    public String getMessage();  //shortened to these field only for simplicity
    public String getSource();
    public int getCount();
}

So the problem is that this file may contain events that were written anywhere between 1 hour ago and 1 second ago, and the only way to know is to inspect the timestamp field in the event itself. When loading these events into Esper, then will all be loaded within a few seconds (there will probably be tens of thousands, and will be loaded as fast as Esper can accept them).

Now, the analysis itself want to calculate average "count" per "source" every 5 minutes in Esper (nothing too complex), however, as all events are loaded within a few seconds, the time window in Esper will be wrong and all events may be within the same time window regardless of when they were produced. So my question is: Is there anyway to override what is counted as the event timestamp in Esper time windows?

The problem also increases when the time window is split between two files that are loaded with an hour delay.

Thank you.


Solution

  • This will do it: select source, sum(count) from MyEvent group by source output all every 5 seconds

    Esper also allows external timer to control time freely in app code.