Search code examples
apache-flinkflink-streamingstream-processing

Flink - Building the operator graph


Good morning everybody,

I have already used Apache Storm to build topologies and I found that a good thing about the API they expose is the possibility to "manually" connect the operators in the graph topology.
You can create loops, for example.

I was wondering if there is a best practice to achieve the same "expressivity" in Flink.

Thank you so much!


Solution

  • Cyclic topologies are not supported in Flink. You can perform iterations through a specific operator. Except for cycles, you define your graph through the standard API and it's rather flexible compared to, for example, Spark. Many DataSet and DataStream API accept both functions and custom implementations of classes like RichMapFunction,RichFlatMapFunction and so on. This gives a huge degree of flexibility and customizability together with modularity and reusability. It takes some time to go beyond the standard API and learn how to customize your Flink Jobs properly but it's worth it.

    Flink has an "easy-mode", that resembles the API of Spark, in which you can do most of the stuff you need. When you want to express stuff that is out of the scope and use cases of the standard API, instead of doing weird workarounds like you have to in Spark, you can work directly with a layer that is partially below the standard API. There are many pieces that you can extend and customize and then plug in place of the provided operators/triggers/sources/sinks and so on. This is mostly documented feature by feature.