Search code examples
python-2.7group-bydefaultdict

Using python groupby or defaultdict effectively?


I have a csv with name, role, years of experience. I want to create a list of tuples that aggregates (name, role1, total_exp_inthisRole) for all the employess. so far i am able to use defaultdict to do the below

 import csv, urllib2
from collections import defaultdict 

response = urllib2.urlopen(url)
cr = csv.reader(response)
parsed = ((row[0],row[1],int(row[2])) for row in cr)    
employees =[]
for item in parsed:
    employees.append(tuple(item))
employeeExp = defaultdict(int)
for x,y,z in employees: # variable unpacking
    employeeExp[x] += z    
employeeExp.items()

output: [('Ken', 15), ('Buckky', 5), ('Tina', 10)]

but how do i use the second column also to achieve the result i want. Shall i try to solve by groupby multiple keys or simpler way is possible? Thanks all in advance.


Solution

  • You can simply pass a tuple of name and role to your defaultdict, instead of only one item:

    for x,y,z in employees:
        employeeExp[(x, y)] += z 
    

    For your second expected output ([('Ken', ('engineer', 5),('sr. engineer', 6)), ...])

    You need to aggregate the result of aforementioned snippet one more time, but this time you need to use a defaultdict with a list:

    d = defaultdict(list)
    
    for (name, rol), total_exp_inthisRole in employeeExp.items():
        d[name].append(rol, total_exp_inthisRole)