Search code examples
pythonpandasdataframegroup-bypivot-table

Groupby value counts on the dataframe pandas


I have the following dataframe:

df = pd.DataFrame([
    (1, 1, 'term1'),
    (1, 2, 'term2'),
    (1, 1, 'term1'),
    (1, 1, 'term2'),
    (2, 2, 'term3'),
    (2, 3, 'term1'),
    (2, 2, 'term1')
], columns=['id', 'group', 'term'])

I want to group it by id and group and calculate the number of each term for this id-group pair.

So in the end I want to get something like this:

enter image description here

Anyway I can achieve this without looping?


Solution

  • I use groupby and size

    df.groupby(['id', 'group', 'term']).size().unstack(fill_value=0)
    

    enter image description here


    Timing

    enter image description here

    1,000,000 rows

    df = pd.DataFrame(dict(id=np.random.choice(100, 1000000),
                           group=np.random.choice(20, 1000000),
                           term=np.random.choice(10, 1000000)))
    

    enter image description here