Search code examples
pythonoperator-keywordpython-itertools

itertools.groupby returns empty list items, when populated with operator.itemgetter


I am trying to use the functions groupby and itemgetter in order to re-arrange a sorted list of tuples into groups

from itertools import groupby
from operator import itemgetter

#initialize a list of tuples
indexed_qualityresults = [(u'moses-R4', 2.0), (u'moses-R4', 3.0), (u'lucy-R4', 3.0), (u'trados-R4', 2.0)]

#group tuples, using as a key the first element of each tuple
groupped_qualityresults = list(groupby(indexed_qualityresults, itemgetter(0)))

#print the key and the respective grouped tuples for each group
print "groupped_qualityresults =", [(a,list(b)) for a,b in groupped_qualityresults]

output is

groupped_qualityresults = [(u'moses-R4', []), (u'lucy-R4', []), (u'trados-R4', [(u'trados-R4', 2.0)])]

As you see, then lists returned for the two first keys of tmy original tuple list are empty, although they shouldn't be.

expected output:

groupped_qualityresults = [(u'moses-R4', [(u'moses-R4', 2.0), (u'moses-R4', 3.0)]), (u'lucy-R4', [(u'lucy-R4', 3.0)]), (u'trados-R4', [(u'trados-R4', 2.0)])]

Can somebody indentify what's going wrong?


Solution

  • Don't call list() on the groupby iterator:

    #group tuples, using as a key the first element of each tuple
    groupped_qualityresults = groupby(indexed_qualityresults, itemgetter(0))
    
    #print the key and the respective grouped tuples for each group
    print "groupped_qualityresults =", [(a,list(b)) for a,b in groupped_qualityresults]
    

    From the itertools.groupby() documentation:

    The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible.

    Turning the output from groupby() into a list advances the groupby() object.