I've checked some topics about groupby()
but I don't get what's wrong with my example:
students = [{'name': 'Paul', 'mail': '@gmail.com'},
{'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'},
{'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'},
{'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
This prints each student separately. Why I don't get only 3 groups: @gmail.com
, @yahoo.com
and @something.com
?
For starters, some of the mails are gmail.com
and some are @gmail.com
which is why they are treated as separate groups.
groupby
also expects the data to be pre-sorted by the same key
function, which explains why you get @something.com
twice.
From the docs:
... Generally, the iterable needs to already be sorted on the same key function. ...
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': 'gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
# sorting by same key function we later use with groupby
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Gregory', 'mail': '@gmail.com'}]
# @something.com
# [{'name': 'Jules', 'mail': '@something.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
# @yahoo.com
# [{'name': 'Tom', 'mail': '@yahoo.com'}]
# gmail.com
# [{'name': 'Jim', 'mail': 'gmail.com'}]
After fixing both sorting and gmail.com
/@gmail.com
we get the expected output:
import itertools
students = [{'name': 'Paul', 'mail': '@gmail.com'}, {'name': 'Tom', 'mail': '@yahoo.com'},
{'name': 'Jim', 'mail': '@gmail.com'}, {'name': 'Jules', 'mail': '@something.com'},
{'name': 'Gregory', 'mail': '@gmail.com'}, {'name': 'Kathrin', 'mail': '@something.com'}]
key_func = lambda student: student['mail']
students.sort(key=key_func)
for key, group in itertools.groupby(students, key=key_func):
print(key)
print(list(group))
# @gmail.com
# [{'mail': '@gmail.com', 'name': 'Paul'},
# {'mail': '@gmail.com', 'name': 'Jim'},
# {'mail': '@gmail.com', 'name': 'Gregory'}]
# @something.com
# [{'mail': '@something.com', 'name': 'Jules'},
# {'mail': '@something.com', 'name': 'Kathrin'}]
# @yahoo.com
# [{'mail': '@yahoo.com', 'name': 'Tom'}]