Search code examples
apache-kafkaapache-kafka-streams

Query Kafka Streams state store within a time range


I want to query a Kafka Streams state store based on time range. The use case is that I'll have the streams processor be scheduled every 30 seconds. During each invocation, I want to query a state store but for only the entries which are "new". I thought TimestampedKeyValueStore might help but couldn't find the right APIs to do it. Is it possible to query the state store based on time range (and with exactly-once guarantee)?


Solution

  • You cannot query a KeyValueStore base on a time range, because this does not really align to the semantics of the store. Queries are always against the key, and a TimestampeKeyValueStore stores an additional value-timestamp.

    You could use a WindowedStore though: note, a windowed-store is basically also just a key-value store, however, it store a timestamp next to the key (not the value; well, there is also TimestampedWindowStore that also does both). This allows you to query time ranges.

    Update

    In upcoming Kafka 3.7 release, new query type are added, that will allow you to do more advanced queries. In particular, VersionedStateStore will support time-range queries (cf KIP-968 and others).