Search code examples
apache-kafkaapache-kafka-streams

What happens to the Kafka state store when you use the application reset tool?


What happens to your state store when you run the Kafka streams application reset tool to reset the app to a particular timestamp (say T-n)?

The document reads: "Internal topics: Delete the internal topic (this automatically deletes any committed offsets)" (Internal topics are used internally by the Kafka Streams application while executing, for example, the changelog topics for state stores)

Does this mean that I lose the state of my state store/RocksDB as it was at T-n?

For example, let's say I was processing a "Session Window" on the state store at that timestamp. It looks like I'll lose all existing data within that window during an application reset.

Is there possibly a way to preserve the state of the Session Window when resetting an application? In other words, is there a way to preserve the state of my state store or RocksDB (at T-n) during an application reset?


Solution

  • The rest tool itself will not touch the local state store, however, it will delete the corresponding changelog topics. So yes, you effectively loose your state.

    Thus, to keep your local state in-sync with the changelog you should actually delete the local state, too, and start with an empty state: https://docs.confluent.io/current/streams/developer-guide/app-reset-tool.html#step-2-reset-the-local-environments-of-your-application-instances

    It is not possible currently, to also reset the state to a specific point atm.

    The only "workaround" might be, to not use the rest tool but bin/kafka-consumer-groups.sh to only modify the input topic offsets. This way you preserve the changelog topics and local state stores. However, when you restart the app the state will of course be in it's last state. Not sure if this is acceptable.