Search code examples
pythonpandasnumpypandas-groupby

Groupby in python pandas: Fast Way


I want to improve the time of a groupby in python pandas. I have this code:

df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len)

The objective is to count how many contracts a client has in a month and add this information in a new column (Nbcontrats).

  • Client: client code
  • Month: month of data extraction
  • Contrat: contract number

I want to improve the time. Below I am only working with a subset of my real data:

%timeit df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len)
1 loops, best of 3: 391 ms per loop

df.shape
Out[309]: (7464, 61)

How can I improve the execution time?


Solution

  • With the DataFrameGroupBy.size method:

    df.set_index(['Client', 'Month'], inplace=True)
    df['Nbcontrats'] = df.groupby(level=(0,1)).size()
    df.reset_index(inplace=True)
    

    The most work goes into assigning the result back into a column of the source DataFrame.