Search code examples
apache-flinkflink-sql

Flink - Dataset - Can Flink respect the order of processing on multiple flows / input ?


In my Flink batch program (DataSet / Table ), I am reading multiple file, this is producing differents flows, do some processing, and save it with output format
As flink is using dataflow model, and my flows are not really related, it is processing in parallel

Yet I want Flink to respect the order of my output operations at least, because I want flow1 to be save before flow2

For example I have something like :

Table table1 = tableEnv.fromTableSource(new MyTableSource1());
DataSet<Obj1> dataSet1 = talbeEnv.toDataSet(table1.select("toto",..),Obj1.class)
dataSet1.output(new WateverdatasinkSQL())

Table table2 = tableEnv.fromTableSource(new MyTableSource2());
DataSet<Obj2 dataSet2 = tableEnv.toDataSet(table2.select("foo","bar",..),Obj2.class)
dataSet2.output(new WateverdatasinkSQL())

I want flink to wait for dataSet1 to be save to continue...
How can I do it as successive operations ?
I have already looked at the execution modes, but this is not doing it

Regards, Bastien


Solution

  • The easiest solution is to separate both flows into individual jobs and execute them one after the other.

    Table table1 = tableEnv.fromTableSource(new MyTableSource1());
    DataSet<Obj1> dataSet1 = talbeEnv.toDataSet(table1.select("toto",..), Obj1.class);
    dataSet1.output(new WateverdatasinkSQL());
    env.execute();
    
    Table table2 = tableEnv.fromTableSource(new MyTableSource2());
    DataSet<Obj2> dataSet2 = tableEnv.toDataSet(table2.select("foo","bar",..), Obj2.class);
    dataSet2.output(new WateverdatasinkSQL());
    env.execute();