Search code examples
pythonpandasdataframeapply

How to use groupby and do iterative operation on dataframe?


I know it sounds like a very simple problem but i am struggling to get the proper solution for this, I have a input dataframe, where i want to add a new column derived based on area group and simple arithmetic operation between amount and rate. I know i have to run aloop for each row to get the previous derived value to calculate next derived value:

enter image description here

Output dataframe (added some comments in :

enter image description here

I am trying something like this:

def func(df):
    for i in range(1, len(df)):
        return (df['derived'].shift(1) * df['rate'])

df['derived'] = df['amount']
df['derived'] =  df.groupby(['area']).apply(func)

But getting error:

ValueError: Buffer dtype mismatch, expected 'Python object' but got 'long'

Solution

  • You can use a groupby.cumprod with custom groups:

    m = df['rate'].isna()
    
    df['derived'] = df['amount'].where(m, df['rate']).groupby(m.cumsum()).cumprod()
    

    output: None provided as the data is not reproducible