Kafka Streams remote state dir

I know that we can configure a state.dir in kafka streams for stateful operations. The state is local to the instance. This way we can do fast lookups.

One problem with this approach is that if your application runs on containerized environment, the state is lost once you restart/redeploy your application. One solution is to configure the state.dir to an external (network attached) directory. The donwside is that key lookups will be slower, but the benefit is that the state is persisted outside the containerized environment so its kept even after a restart of the container.

Do you guys think this is a good approach for preventing unnecessary state restoration upon restarts in a containerized environment (besides stateful sets in kubernetes, we dont use k8s yet)?

Solution

One problem with this approach is that if your application runs on containerized environment, the state is lost once you restart/redeploy your application

Not necessarily. You can attach disks to your container and using stateful sets (Kubernetes) you can re-attach the same disks and thus preserve the state.

Cf https://www.confluent.io/kafka-summit-sf18/deploying-kafka-streams-applications/

Network file systems often cause issues. It's not recommended to use them. To get quicker fail-over, you can user standby tasks instead.