Search code examples
pythonlistvectorlist-comprehensionaverage

Calculating the average vector for each unique element in a list


I have a list of the form:

mylist =[([256, 408, 147, 628], 'size'), ([628, 526, 236, 676], 'camera'),
 ([526, 876, 676, 541], 'camera'), ([567, 731, 724, 203], 'size'),.....]

That has a size of around 8000+.

It contains many duplicate entries, there are actually only 100 unique words in this list and so I would like to reduce this list down to a size of 100 (the number of unique words) by taking the average vector of every occurance of that word.

For example, my new list will have the form:

newlist = [([411.5,569.5,435.5,415.5],'size',.....] #I have taken the average values of 'size' 
here and want to repeat this for each unique word

and will be of length 100.

How would I do this?


Solution

  • You can do this by collecting all the data for each 'key' into a dict, then work out the average for each element in each list assigned to that key. Something like:

    from statistics import mean
    
    data = [([1, 2, 3, 4], 'size'), ([10, 20, 30, 40], 'camera'),
     ([100, 200, 300, 400], 'camera'), ([10, 20, 30, 40], 'size')]
    
    ddata = {}
    for entry in data:
        key = entry[-1]
        if not key in ddata:
            ddata[key] = []
        ddata[key].append(entry[0])
    
    #print(ddata)
    
    out = []
    for k, v in ddata.items():
        out.append((list(map(mean, zip(*v))), k))
    
    print(out)
    # [([5.5, 11, 16.5, 22], 'size'), ([55, 110, 165, 220], 'camera')]