Search code examples
pythonpandasretain

Python Retain function. Use value from previous row in calculation


In [10]: df
Out[10]:
     PART AVAILABLE_INVENTORY DEMAND
1    A    12                  6
2    A    12                  2
3    A    12                  1
4    B    24                  1
5    B    24                  1
6    B    24                  4
7    B    24                  3

Output wanted:

     PART AVAILABLE_INVENTORY DEMAND  AI   AI_AFTER
1    A    12                  6       12   6
2    A    12                  2       6    4
3    A    12                  1       4    3
4    B    24                  1       24   23
5    B    24                  1       23   22
6    B    24                  4       22   18
7    B    24                  3       18   15

The code I have so far is below but it is not giving the output I am looking for:

def retain(df):
    df['PREV_PART'] = df['PART'].shift()
    df['PREV_AI_AFTER'] = df['AI'].shift() - df['DEMAND'].shift()
    df['AI'] = np.where(df['PART'] != df['PREV_PART'], df['AI'], df['PREV_AI_AFTER'])
    df['AI_AFTER'] = df['AI'] - df['DEMAND']

df['AI'] = df['AVAILABLE_INVENTORY']
retain(df)

What is the fastest way to do this with performance in mind?


Solution

  • you can do it with groupby with cumsum on the column 'DEMAND' and shift on the column 'AI_AFTER' just created before:

    df['AI_AFTER'] = df['AVAILABLE_INVENTORY'] - df.groupby('PART')['DEMAND'].cumsum()
    df['AI'] = df.groupby('PART')['AI_AFTER'].shift().fillna(df['AVAILABLE_INVENTORY'])
    print (df)
      PART  AVAILABLE_INVENTORY  DEMAND  AI_AFTER    AI
    1    A                   12       6         6  12.0
    2    A                   12       2         4   6.0
    3    A                   12       1         3   4.0
    4    B                   24       1        23  24.0
    5    B                   24       1        22  23.0
    6    B                   24       4        18  22.0
    7    B                   24       3        15  18.0