Search code examples
pythonlistcategoriessublist

How to create a new layer of sublists based on a common key within each sublist in order to categorize the sublists?


How to create a new layer of sublists based on a common key within each sublist in order to categorize the sublists? In other words, how do you place sublists into a new sublist within the list where each item at index 1 is the same?

For example, I'd like to turn the following list of sublists into a list of sublists in which each sublist is in a new sublist where each item at index 1 is the same within that sublist. I'd like to place the sublists of apples, bananas and oranges in this list into a new sublist.

lsta = [['2014W01','apple',21,'[email protected]'],['2014W02','apple',19,'[email protected]'],['2014W02','banana',51,'[email protected]'],['2014W03','apple',100,'[email protected]'],['2014W01','banana',71,'[email protected]'],['2014W02','organge',21,'[email protected]']]

I'd like the three sublists of apples to be contained within a new sublist, as well as the two sublists of bananas into a new sublist, etc.

Desired_List = [[['2014W01','apple',21,'[email protected]'],['2014W02','apple',19,'[email protected]'],['2014W03','apple',100,'[email protected]']],[['2014W02','banana',51,'[email protected]'],['2014W01','banana',71,'[email protected]']],[['2014W02','organge',21,'[email protected]']]]

Bonus points, if you could tell me how to do multiple categorizations (e.g. not only separating by fruit type, but also by week)?


Solution

  • I'll take a bit of a different tack. You probably want your group-by field to be the lookup value in a dict. The value can just be a list of various.. whatever you want to call each sublist here. I'll call each one a FruitPerson.

    from collections import defaultdict, namedtuple
    
    FruitPerson = namedtuple('FruitPerson','id age email')
    
    d = defaultdict(list)
    
    for sublist in lsta:
        d[sublist[1]].append(FruitPerson(sublist[0],*sublist[2:]))
    

    Then, for example:

    d['apple']
    Out[19]: 
    [FruitPerson(id='2014W01', age=21, email='[email protected]'),
     FruitPerson(id='2014W02', age=19, email='[email protected]'),
     FruitPerson(id='2014W03', age=100, email='[email protected]')]
    
    d['apple'][0]
    Out[20]: FruitPerson(id='2014W01', age=21, email='[email protected]')
    
    d['apple'][0].id
    Out[21]: '2014W01'
    

    Edit: okay, multiple-categorization-bonus-point question. You just need to nest your dictionaries. The syntax gets a little goofy because the argument to defaultdict has to be a callable; you can do this with either lambda or functools.partial:

    FruitPerson = namedtuple('FruitPerson','age email') #just removed 'id' field
    d = defaultdict(lambda: defaultdict(list))
    
    for sublist in lsta:
        d[sublist[1]][sublist[0]].append(FruitPerson(*sublist[2:]))
    
    d['apple']
    Out[37]: defaultdict(<type 'list'>, {'2014W03': [FruitPerson(age=100, email='[email protected]')], '2014W02': [FruitPerson(age=19, email='[email protected]')], '2014W01': [FruitPerson(age=21, email='[email protected]')]})
    
    d['apple']['2014W01']
    Out[38]: [FruitPerson(age=21, email='[email protected]')]
    
    d['apple']['2014W01'][0].email
    Out[40]: '[email protected]'
    

    Though honestly at this point you should consider moving up to a real relational database that can understand SELECT whatever FROM whatever WHERE something type queries.