Search code examples
apache-sparkspark-streamingapache-flinkflink-streamingflink-sql

Flink streaming parameters to tune?


I am currently working on a project titled by Automatic Tuning for Flink streaming framework.
Basically, we aim to create a model(Reinforcement learning agent) to select the best values for Flink parameters. Such a problem occurs in the Spark framework, as an example, choosing the right configuration can be challenging and no doing it correctly may have a significant impact on the performance.

What I would like to know is:

  1. Aside from code optimization, are there parameters that require tuning in a streaming job for Flink?
  2. Is there a shortlist of parameters that we need to focus on, created by experts?
  3. Is choosing the right parameters requires a trainable model(a sophisticated process) or maybe it's simply not that challenging?

Thank you.


Solution

  • There are a lot of parameters that can, in some cases, have a significant impact on the performance of Flink applications. But I don't think you could train a model that would learn anything useful. The parameter space is vast, and a change that helps one application under some circumstances probably won't even help that same application running in a different context (i.e., at a different scale), let alone prove useful for tuning other applications.