Search code examples
apache-flinkflink-streamingflink-cep

How can i share state between my flink jobs?


I run multiple job from my .jar file. i want share state between my jobs. but all inputs consumes(from kafka) in every job and generate duplicate output. i see my flink panel. all of jobs 'record sents' is 3. i think must split number to my jobs.

I create job with this command

bin/flink run app.jar

How can i fix it?


Solution

  • Because of its focus on scalability and high performance, Flink state is local. Flink doesn't really provide a mechanism for sharing state between jobs.

    However, Flink does support splitting up a large job among a fleet of workers. A Flink cluster is able to run a single job in parallel, using the resources of one or many multi-core CPUs. Some Flink jobs are running on thousands of cores, just to give an idea of its scalability.

    When used with Kafka, each Kafka partition can be read by a different subtask in Flink, and processed by its own parallel instance of the pipeline.

    You might begin by running a single parallel instance of your job via

    bin/flink run --parallelism <parallelism> app.jar

    For this to succeed, your cluster will have to have at least as many free slots as the parallelism you request. The parallelism should be less than or equal to the number of partitions in the Kafka topic(s) being consumed. The Flink Kafka consumers will coordinate amongst themselves -- with each of them reading from one or more partitions.