Search code examples
apache-flinkflink-cep

Is there a way to discard partial matches?


I'm monitoring the movement of vehicles. Whenever they enter or exit a geofence, an event is triggered. Via Flink I want to check whether a vehicle was within a fence for at least 2 hours.

Here's the pattern I'm using:

begin("enter").notFollowedBy("exit").within(Time.hours(2)).followedBy("final-exit")

The thing is: the partial "enter" match stays valid even if the notFollowed "exit" is true, i.e. if the vehicle returns a couple of days later and triggers another exit-Fence event, it gets pass the notFollowedBy-rule and results in a complete match.

Is there a way to tell Flink to discard a partial match actively?


Solution

  • Logically it makes sense that the partial "enter" match stays valid if it is not followed by "exit" within two days, since in that case it was not followed by "exit" within two hours. Can you describe under what circumstances you want such a partial match to timeout? If you want it to timeout after 8 hours, for example, then I think that behavior could be described by

    begin("enter")
      .notFollowedBy("exit").within(Time.hours(2))
      .followedBy("final-exit").within(Time.hours(8))
    

    UPDATE

    If your objective is to filter out all cases where the vehicles enter and leave the geofence within 2 hours, and then properly process the remaining events, then you could run a pipeline that has two stages. The first stage can be described as

    begin("enter")
      .followedBy("quick-exit").within(Time.hours(2))
    

    and then you can drop the events that match this pattern, and do further processing on those that timeout. See Handling Timed Out Partial Patterns for details on how to use processTimedOutMatch to capture and emit the timed out partial patterns, which will be the ones that you want to consider for further processing.