I have a use case where I want to run 2 independent processing flows on Flink. So 2 flows would look like
Source1 -> operator1 -> Sink1
Source2 -> operator2 -> Sink2
I want to re-use the same Flink cluster for both flows. I can think of doing this in 2 ways:
1) submit 2 different jobs on the same Flink application
2) Setup 2 pipelines in same job
I was able to setup the first option, but not sure how to do the second option. Has anyone tried such a setup before? What is the advantage of one over the other?
You can simply create multiple pipelines (with separate or shared source consumers) in your setupJob() method. Here is an example:
private void buildPipeline(StreamExecutionEnvironment env, String sourceName, String sinkName) {
DataStream<T> stream = env
.addSource(getInputs().get(sourceName))
.name(sourceName);
stream = stream.filter(evt -> filter());
....
}
@Override
public void setupJob(AthenaFlinkJobConfiguration jobConfig, StreamExecutionEnvironment env) throws Exception {
...
buildPipeline(env, sourceTopic1, sink1, ...);
buildPipeline(env, sourceTopic2, sink2, ...);
...
}
Here is a quick contrast of both approaches. The Pros/Cons of using separate jobs:
The benefits of using separate pipeline in a single job: