Search code examples
frameworksapache-flinkapache-nififlink-streaming

Multiple Streams support in Apache Flink Job


My Question in regarding Apache Flink framework.

Is there any way to support more than one streaming source like kafka and twitter in single flink job? Is there any work around.Can we process more than one streaming sources at a time in single flink job?

I am currently working in Spark Streaming and this is the limitation there.

Is this achievable by other streaming frameworks like Apache Samza,Storm or NIFI?

Response is much awaited.


Solution

  • Yes, this is possible in Flink and Storm (no clue about Samza or NIFI...)

    You can add as many source operators as you want and each can consume from a different source.

    StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
    Properties properties = ... // see Flink webpage for more details    
    
    DataStream<String> stream1 = env.addSource(new FlinkKafkaConsumer08<>("topic", new SimpleStringSchema(), properties);)
    DataStream<String> stream2 = env.readTextFile("/tmp/myFile.txt");
    
    DataStream<String> allStreams = stream1.union(stream2);
    

    For Storm using low level API, the pattern is similar. See An Apache Storm bolt receive multiple input tuples from different spout/bolt