Search code examples
apache-flinkflink-streaming

Pause or slow down considerably Flink (Kafka) source until dependencies are ready


I'm designing a Flink job that basically it's going to:

  • Read messages from Kafka
  • Process the incoming messages' data by requesting some more information from external services — for instance, make some HTTP/gRPC calls, retrieve some data, do some aggregation and store some partial results in a database
  • Propagate results into Elasticsearch

What I don't have clear is how to slow down, or completely pause the Flink job from reading Kafka messages whenever one of the (RESTful / gRPC) services is down, or simply decreasing their performance.

Is there a way to achieve this by using some configuration settings or some Flink built-in component?


Solution

  • Flink will automatically slow down the rate at which data is read from Kafka due to backpressure. When the function that's making REST/gRPC calls can't keep up with the data coming from Kafka, its input network buffers will fill up, and data "pushes" from the Kafka source to that operator will be delayed until space opens up.