Search code examples
apache-flinkstream-processing

Apache Flink - Compute the last window on event time based


My job does the following things:

  1. Consumes events from Kafka topic based on event time.
  2. Computes a window size of 7 days and in a slide of 1 day.
  3. Sink the results to Redis.

I have several issues:

  1. In case it consumes Kafka events from the lastest record, after 1 day the job is alive, the job closes the window and computes 7 days window. The problem is that the job has only data for 1 day and hence the results are wrong.
  2. If I try to let it consumes the Kafka events from a timestamp of 7 days ago, as the job starts, it calculates the whole windows from the first day, and it took a lot of time. Also, I want just the last window results because this is what matters for me.

Have I missed something? Is there a better way to do that?


Solution

  • Flink aligns time windows to the epoch. So if you have windows that are one hour long, they run from the top of the hour to the top of the hour. Day long windows run from midnight to midnight. The same principle applies to windows that are seven days long, and since the epoch began on a Thursday (Jan 1, 1970), a window that is seven days long should close at midnight on Wednesday night / Thursday morning.

    You can supply an offset to the window constructor if you want to shift the windows to start at a different time.