Search code examples
kubernetesapache-flinkflink-streaming

Flink Kubernetes S3 state support


Been looking at the documentation with Flink Kubernetes Operator v1.10, is there a way to preconfigure the cluster so that all submitted jobs will be using rocksdb state with some predefined s3 path? What would be required for that to work? I've been trying to set the jobs up with S3 backend but it's saying that the s3 backend is not supported and I would need to enable the s3 plugin, but I'm unsure how to go about that.


Solution

  • You should:

    1. Enable RocksDB State Backend: Set state.backend: rocksdb in the flink-conf.yaml file. Add state.checkpoints.dir: s3:/// for S3 checkpoint storage.

    2. Enable S3 Plugin: Include the S3 plugin in your Flink image or deployment. Add the flink-s3-fs-hadoop or flink-s3-fs-presto jar to the plugins directory.

    3. Provide S3 Credentials: Configure access keys using environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY) or add them to flink-conf.yaml.

    4. Deploy on Kubernetes: Use a custom Flink Docker image with the S3 plugin enabled, or mount the plugin directory into your Kubernetes pods.