now, I have a RDD, which the records in the RDD are as follows:
key1 value1
key1 value2
key2 value3
key3 value4
key3 value5
I want to get the RDD records which have different keys ,as follows:
key1 value1
key2 value3
key3 value4
I can just use the spark-core APIs and don't aggregate values of the same key.
You could do this with PairRDDFunctions.reduceByKey
. Assuming you have an RDD[(K, V)]
:
rdd.reduceByKey((a, b) => if (someCondition) a else b)