Search code examples
apache-kafkaapache-kafka-streamsstream-processing

What happens if session window ends before retention period and inactivity gap ends after retention period?


Here is simple session window using Kafka Streams:

stream
  .groupBy()
  .windowedBy(SessionWindows.with(Duration.ofMinutes(30)).grace(Duration.ofMinutes(0)))
  .aggregate(...) // implementation of aggregate function

Using the following piece of code, we can configure state store:

Materialized
  .as(Stores.persistentSessionStore(storeName, Duration.ofHours(2))
  .withCachingEnabled()
  .withLoggingEnabled()
  .withKeySerde(keySerde)
  .withValueSerde(valueSerde)

Documentation states:

Note that the retention period must be at least long enough to contain the windowed data's entire life cycle, from window-start through window-end, and for the entire grace period.

We don't apply grace period. But consider this scenario: Session window ends before retention period but inactivity gap ends after retention period. I would like to know, is there a chance of session data loss? How aggressive is cleanup applied?


Solution

  • It seems to be a c&p error from TimeWindowed store.

    Compare the code:

    https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/kstream/internals/SessionWindowedKStreamImpl.java#L186

    https://github.com/apache/kafka/blob/2.3/streams/src/main/java/org/apache/kafka/streams/kstream/internals/TimeWindowedKStreamImpl.java#L166

    I created a JIRA ticket to get it fixed: https://issues.apache.org/jira/browse/KAFKA-9068