Search code examples
pythonpython-itertools

Group items in a list and calculate sums


I have a list with weekly figures and need to obtain the grouped totals by month.

The following code does the job, but there should be a more pythonic way of doing it with using the standard libraries. The drawback of the code below is that the list needs to be in sorted order.

#Test data (not sorted)
sum_weekly=[('2020/01/05', 59), ('2020/01/19', 88), ('2020/01/26', 95), ('2020/02/02', 89),
 ('2020/02/09', 113), ('2020/02/16', 90), ('2020/02/23', 68), ('2020/03/01', 74), ('2020/03/08', 85),
  ('2020/04/19', 6), ('2020/04/26', 5), ('2020/05/03', 14),
 ('2020/05/10', 5), ('2020/05/17', 20), ('2020/05/24', 28),('2020/03/15', 56), ('2020/03/29', 5), ('2020/04/12', 2),]

month = sum_weekly[0][0].split('/')[1]
count=0
out=[]
for item in sum_weekly:
    m_sel = item[0].split('/')[1]
    if m_sel!=month:
        out.append((month, count))
        count=item[1]
    else:
        count+=item[1]
    month = m_sel
out.append((month, count))

# monthly sums output as ('01', 242), ('02', 360), ('03', 220), ('04', 13), ('05', 67)
print (out)

Solution

  • You could use defaultdict to store the result instead of a list. The keys of the dictionary would be the months and you can simply add the values with the same month (key).

    Possible implementation:

    # Test Data
    from collections import defaultdict
    
    sum_weekly = [('2020/01/05', 59), ('2020/01/19', 88), ('2020/01/26', 95), ('2020/02/02', 89),
                  ('2020/02/09', 113), ('2020/02/16', 90), ('2020/02/23', 68), ('2020/03/01', 74), ('2020/03/08', 85),
                  ('2020/03/15', 56), ('2020/03/29', 5), ('2020/04/12', 2), ('2020/04/19', 6), ('2020/04/26', 5),
                  ('2020/05/03', 14),
                  ('2020/05/10', 5), ('2020/05/17', 20), ('2020/05/24', 28)]
    
    
    results = defaultdict(int)
    for date, count in sum_weekly: # used unpacking to make it clearer
        month = date.split('/')[1]
        # because we use a defaultdict if the key does not exist it
        # the entry for the key will be created and initialize at zero
        results[month] += count
    
    print(results)