Search code examples
joinapache-flinkenvironmentflink-streaming

Flink: Are multiple execution environments supported?


Is it OK to create multiple ExecutionEnvironments in a Flink program? More specifically, create one ExecutionEnvironment and one StreamExecutionEnvironment in the same main method, so that one can work with batch and later transit to streaming without problems?

I guess that the other possibility would be to split the program in two, but for my testing purposes this seems better. Is Flink prepared for this scenario?


All seems to work fine, except I am currently having problems with no output when joining two streams on a common index and using window(TumblingProcessingTimeWindows.of(Time.seconds(1))). I have already called setStreamTimeCharacteristic(TimeCharacteristic.EventTime) on the StreamExecutionEnvironment and even tried assigning custom watermarks on both joined streams with assignTimestampsAndWatermarks where I just return System.currentTimeMillis() as the timestamp of each record. Since it finishes really quickly, both streams should fit in that 1-second window, no? Both streams print just fine right before the join. I can try supplying the important parts of code (it's rather lengthy) if anyone's interested.


UPDATE: OK, so I separated the two environments (put each inside a main method) and then I simply call the first main from the second main method. The described problem no longer occurs.


Solution

  • No, this not supported, and won't really work.

    At least up through Flink 1.9, a given application must either have an ExecutionEnvironment and use the DataSet API, or a StreamExecutionEnvironment and use the DataStream API. You cannot mix the two in one application.

    There is ongoing work to more completely unify batch and streaming, but that's a work in progress. To understand this better you might want to watch the video for this recent Flink Forward talk when it becomes available.