Search code examples
pythonlistdictionarylist-comprehensiondictionary-comprehension

Average of elements in a list of list grouped by first item in the list


My list looks like my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1]], ['B', 10, 7]]

I want to find the averages of the other two columns in each of the inner lists grouped by the first column in each of the inner list.

[['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]]

['A', 5, 7.5] comes from ['A', (6+4)/2 ,(7+8)/2]

I don't mind if I end up getting a dictionary or something, but I would prefer it remain a list.

I've tried the following:


  1. my_list1 = [i[0] for i in my_list] my_list2 = [i[1:] for i in my_list] new_dict = {k: v for k, v in zip(my_list1, my_list2)}

SPLITTING THE ORIGINAL LIST SO the first column becomes KEY, and the second and third columns becomes VALUE, and converting it to a dictionary will give you the aggregate but the problem is

I WANT TO TO PRESERVE THE DECIMAL PLACES, IT ROUNDS UP AND GIVES ME WHOLE NUMBERS INSTEAD OF FLOAT VALUES

my_list1 = ['A', 'A', 'B', 'C', 'B']

my_list2 = [[6, 7], [4, 8], [9, 3], [1, 1], [10, 7]]

new_dict= {'A': [5, 8], 'B': [10, 5], 'C': [1, 1]}

when what I would ideally want is, [['A', 5, 7.5], ['B', 9.5, 5], ['C', 1, 1]] (Don't mind if its a dictionary)


  1. Converted the second and third columns to float maybe using a for loop thinking, then it will give me a float when I convert it to a dictionary.. But no difference, IT ROUNDS UP and gives a A WHOLE NUMBER.

    for i in range(0, len(my_list)):
      for j in range(1, len(my_list[i])):
        my_list[i][j].astype(float)
    
    dict = {}
    
    for l2 in my_list:
      dict[l2[0]] = l2[1:]
    

The reason I need to preserve the decimal places is because the second and third columns refer to x and y coordinates..

So all in all the objective is to find the averages of the other two columns in each of the inner lists grouped by the first column in each of the inner list with as many decimal places as possible


Solution

  • Assuming you meant to use the following list:

    In [4]: my_list = [['A', 6, 7], ['A', 4, 8], ['B', 9, 3], ['C', 1, 1], ['B', 10, 7]]
    

    The simply use a defaultdict to group by the first element, then find the mean:

    In [6]: from collections import defaultdict
    
    In [7]: grouper = defaultdict(list)
    
    In [8]: for k, *tail in my_list:
        ...:     grouper[k].append(tail)
        ...:
    
    In [9]: grouper
    Out[9]:
    defaultdict(list,
                {'A': [[6, 7], [4, 8]], 'B': [[9, 3], [10, 7]], 'C': [[1, 1]]})
    
    In [10]: import statistics
    
    In [11]: {k: list(map(statistics.mean, zip(*v))) for k,v in grouper.items()}
    Out[11]: {'A': [5, 7.5], 'B': [9.5, 5], 'C': [1, 1]}
    

    Note, if you are on Python 2, no need to call list after map. Also, you should use iteritems instead of items.

    Also, you will have to do something like:

    for sub in my_list:
        grouper[sub[0]].append(sub[1:])
    

    Instead of the cleaner version on Python 3.

    Finally, there is no statistics module in Python 2. So just do:

    def mean(seq):
        return float(sum(seq))/len(seq)
    

    and use that mean instead of statistics.mean