I want to improve the time of a groupby
in python pandas.
I have this code:
df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len)
The objective is to count how many contracts a client has in a month and add this information in a new column (Nbcontrats
).
Client
: client codeMonth
: month of data extractionContrat
: contract numberI want to improve the time. Below I am only working with a subset of my real data:
%timeit df["Nbcontrats"] = df.groupby(['Client', 'Month'])['Contrat'].transform(len)
1 loops, best of 3: 391 ms per loop
df.shape
Out[309]: (7464, 61)
How can I improve the execution time?
With the DataFrameGroupBy.size
method:
df.set_index(['Client', 'Month'], inplace=True)
df['Nbcontrats'] = df.groupby(level=(0,1)).size()
df.reset_index(inplace=True)
The most work goes into assigning the result back into a column of the source DataFrame.