Search code examples
javaapache-sparkspark-streaming

How to add more RDD to existing RDD in Spark?


I have a RDD and want to add more RDD to it. How can I do it in Spark? I have code like below. I want to return RDD from the dStream I have.

JavaDStream<Object> newDStream = dStream.map(this);
JavaRDD<Object> rdd = context.sparkContext().emptyRDD();
return newDStream.wrapRDD(context.sparkContext().emptyRDD());

I do not find much documentation about wrapRDD method of JavaDStream class provided by Apache Spark.


Solution

  • You can use JavaStreamingContext.queueStream and fill it with a Queue<RDD<YourType>>:

    public JavaInputDStream<Object> FillDStream() {
        LinkedList<RDD<Object>> rdds = new LinkedList<RDD<Object>>();
        rdds.add(context.sparkContext.emptyRDD());
        rdds.add(context.sparkContext.emptyRDD());
    
        JavaInputDStream<Object> filledDStream = context.queueStream(rdds);
        return filledStream;
    }