I am trying to use the functions groupby and itemgetter in order to re-arrange a sorted list of tuples into groups
from itertools import groupby
from operator import itemgetter
#initialize a list of tuples
indexed_qualityresults = [(u'moses-R4', 2.0), (u'moses-R4', 3.0), (u'lucy-R4', 3.0), (u'trados-R4', 2.0)]
#group tuples, using as a key the first element of each tuple
groupped_qualityresults = list(groupby(indexed_qualityresults, itemgetter(0)))
#print the key and the respective grouped tuples for each group
print "groupped_qualityresults =", [(a,list(b)) for a,b in groupped_qualityresults]
output is
groupped_qualityresults = [(u'moses-R4', []), (u'lucy-R4', []), (u'trados-R4', [(u'trados-R4', 2.0)])]
As you see, then lists returned for the two first keys of tmy original tuple list are empty, although they shouldn't be.
expected output:
groupped_qualityresults = [(u'moses-R4', [(u'moses-R4', 2.0), (u'moses-R4', 3.0)]), (u'lucy-R4', [(u'lucy-R4', 3.0)]), (u'trados-R4', [(u'trados-R4', 2.0)])]
Can somebody indentify what's going wrong?
Don't call list()
on the groupby
iterator:
#group tuples, using as a key the first element of each tuple
groupped_qualityresults = groupby(indexed_qualityresults, itemgetter(0))
#print the key and the respective grouped tuples for each group
print "groupped_qualityresults =", [(a,list(b)) for a,b in groupped_qualityresults]
From the itertools.groupby()
documentation:
The returned group is itself an iterator that shares the underlying iterable with
groupby()
. Because the source is shared, when thegroupby()
object is advanced, the previous group is no longer visible.
Turning the output from groupby()
into a list advances the groupby()
object.