Search code examples
pythonpandasperformanceallocation

What is the fastest way to calculate and add a column in pandas?


I would like to add a column at the end of a dataframe containing the moving average (EWM) for a specific value.

Currently, I am using 2 for loops:

for country in Country_Names:
 for i in i_Codes:
    EMA = df[(df['COUNTRY_NAME']==country) & (df['I_CODE']==i)].KRI_VALUE.ewm(span=6, adjust=False).mean()
    df.loc[(df['COUNTRY_NAME']==country) & (df['I_CODE']==i), 'EMA'] = EMA

This is really quite slow (takes a few minutes - I have more than 50,000 rows...): does anyone have a better idea?

Many thanks!

ODO22


Solution

  • I'm gonna guess how it might work without seeing the data,

    df['EMA'] = (df.groupby([Country_Names,i_Codes])
                   .transform(lambda x:x.KRI_VALUE.ewm(span=6, adjust=False).mean())