I am trying to combine two JavaPairRDD, so that I can do a reduceByKey job on the combined dataset, like below:
JavaPairRDD data1 = ...
JavaPairRDD data2 = ...
I want to have a new dataset which contains both data1 and data2, something like:
JavaPairRDD data_total = (data1 + data2)
So that I can do a reduce by key on the combined dataset:
JavaPairRDD output = data_total.reduceByKey(... my reduce function ...);
What's the best way to combine data1 and data2? Or what's the best approach to this problem?
Thanks a lot!
You can use union
:
// Return the union of this RDD and another one.
union(JavaPairRDD<K,V> other)