I have the RDD:
I want to reduce it to each key and its average per point in the tuple. For example if I have (small example):
[(1,((0,19,15,39),1)),(1,((0,64,19,3),1))]
I will get:
[(1,(0,83,34,41),2))]
then (or directly)
[(1,(0,41.5,17,21)]
I tried:
reduceByKey(lambda a,b: a+b)
reduceByKey(lambda a,b: (a[0]+b[0],a[1]+b[1]))
and other stuff that didn't helped or gave RDD error.
How can I solve the issue?
You need to do some further calculations to get the average per key:
result = rdd.reduceByKey(lambda a, b: (tuple(i+j for (i,j) in zip(a[0],b[0])), a[1]+b[1])) \
.map(lambda r: (r[0], tuple(i/r[1][1] for i in r[1][0])))