(K1, (v1, v2))
(K2, (v3, v4))
(K1, (v1, v5))
(K2, (v3, v6))
How can I sum up the values of the key provided the first value is the some or eque such that I get (k1, (v1,v2+v5), (k2,(v3,v4+v6) ?
IIUC, you need to change the key before the reduce
, and then map your values back in the desired format.
You should be able to do the following:
new_rdd = rdd.map(lambda row: ((row[0], row[1][0]), row[1][1]))\
.reduceByKey(sum).
.map(lambda row: (row[0][0], (row[0][1], row[1])))