I have a kafka stream application in which it is using stateStore (backed by RocksDB).
All what stream thread is doing is getting data from kafka topic and putting the data into state-store. (There is other thread which read data from statestore and does business logic processing).
I observed it creates a new kafka topic "changelog" because of stateStore.
But I didn't get what purpose "changelog" kafka topic serves?
When you enable change logging for a state store, Kafka Streams captures changes to the state and writes them to a changelog topic in Kafka. This changelog topic acts as a durable and fault-tolerant storage for the state, allowing the state to be restored in case of application restarts or failures.
Lets take word count example.
When a word is processed multiple times, the state store updates the count for that word, and these updates are written to the changelog topic.
The changelog topic for the word-count-store
might contain records like the following
If the Kafka Streams application restarts or fails over to another instance, it can restore the state of the word-count-store
by replaying the changelog topic from the beginning. This ensures that the state is consistent and up-to-date across application instances.
To optimize storage and reduce the volume of change log data, it can be configured to use log compaction. This ensures that only the latest update for each key is retained in the changelog topic, allowing the state to be fully restored while minimizing storage requirements.