I have the following dataframe:
df = pd.DataFrame([
(1, 1, 'term1'),
(1, 2, 'term2'),
(1, 1, 'term1'),
(1, 1, 'term2'),
(2, 2, 'term3'),
(2, 3, 'term1'),
(2, 2, 'term1')
], columns=['id', 'group', 'term'])
I want to group it by id
and group
and calculate the number of each term for this id-group pair.
So in the end I want to get something like this:
Anyway I can achieve this without looping?
I use groupby
and size
df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
1,000,000 rows
df = pd.DataFrame(dict(id=np.random.choice(100, 1000000),
group=np.random.choice(20, 1000000),
term=np.random.choice(10, 1000000)))