I have a pair RDD like this:
id value
id1 set(1232, 3,1,93,35)
id2 set(321,42,5,13)
id3 set(1233,3,5)
id4 set(1232, 56,3,35,5)
Now, I want to get the total count of ids per value contained in the set. So the output for the above table should be something like this:
set value count
1232 2
1 1
93 1
35 2
3 3
5 3
321 1
42 1
13 1
1233 1
56 1
Is there a way to achieve this?
yourrdd.toDF().withColumn(“_2”,explode(col(“_2”))).groupBy(“_2”).count.show