How to calculate average of x and y coordinates by key in an rdd?

I have an rdd in a key-value pair form, with a centroid as the key and all the nearest points to them as the values within a list.

data = [('d1',
  [(4.832, 1.963),
   (5.439, 2.147),
   (5.009, 2.522)]),
('d2',
  [(4.26, 2.033),
   (5.24, 1.642),
   (4.814, 2.033)]),
('d3',
  [(4.646, 1.827),
   (5.137, 1.858),
   (5.288, 1.842)])]

I am trying to calculate the average of all x and y coordinates separately for each centroid by key. I am looking to generate the output as below

[('d1',(5.09, 2.21)),
('d2',(4.77, 1.9)),
('d3',(5.02, 1.84))]

I have tried the following code but i am not getting any result.

data.reduceByKey(lambda x,y: mean(x[1],y[1])).collect()

I am kinda stuck here and would really appreciate some help on this.

Solution

You don't need to reduce by key because the data is already grouped by key. You just need to calculate the mean for each entry, using numpy.mean for example.

import numpy as np

avg_data = data.map(lambda r: (r[0], tuple(np.mean(r[1], axis=0))))

avg_data.collect()
# [('d1', (5.093333333333334, 2.2106666666666666)), 
#  ('d2', (4.771333333333334, 1.9026666666666667)), 
#  ('d3', (5.023666666666666, 1.8423333333333334))]