python list vector list-comprehension average

Calculating the average vector for each unique element in a list

I have a list of the form:

mylist =[([256, 408, 147, 628], 'size'), ([628, 526, 236, 676], 'camera'),
 ([526, 876, 676, 541], 'camera'), ([567, 731, 724, 203], 'size'),.....]

That has a size of around 8000+.

It contains many duplicate entries, there are actually only 100 unique words in this list and so I would like to reduce this list down to a size of 100 (the number of unique words) by taking the average vector of every occurance of that word.

For example, my new list will have the form:

newlist = [([411.5,569.5,435.5,415.5],'size',.....] #I have taken the average values of 'size' 
here and want to repeat this for each unique word

and will be of length 100.

How would I do this?

Solution

You can do this by collecting all the data for each 'key' into a dict, then work out the average for each element in each list assigned to that key. Something like:

from statistics import mean

data = [([1, 2, 3, 4], 'size'), ([10, 20, 30, 40], 'camera'),
 ([100, 200, 300, 400], 'camera'), ([10, 20, 30, 40], 'size')]

ddata = {}
for entry in data:
    key = entry[-1]
    if not key in ddata:
        ddata[key] = []
    ddata[key].append(entry[0])

#print(ddata)

out = []
for k, v in ddata.items():
    out.append((list(map(mean, zip(*v))), k))

print(out)
# [([5.5, 11, 16.5, 22], 'size'), ([55, 110, 165, 220], 'camera')]