Search code examples
pythonpandasdataframecumsum

Pandas Dataframe cumsum on selected range


I have a dataframe as below.

Part   Quantity   Inventory
AAA    -1         8
AAA    -1         NaN 
AAA    -5         NaN 
AAA    10         NaN 
AAA    -3         NaN 
BBB    5          8 
BBB    -3         NaN 
BBB    -3         NaN 
BBB    5          NaN 
BBB    -3         NaN

I have the following code to replace the NaN values with the cumulative sum from the 'Quantity' column and starting value is first non-NaN in 'Inventory':

df.loc[~df['Inventory'].isna(), 'Quantity'] = df['Inventory']
mask = ~df['Inventory'].isna()
group = mask.cumsum()
df['Inventory'] = df.groupby(group)['Quantity'].cumsum()
df.loc[mask, 'Inventory'] = df['Quantity']

This gives me as result:

Part   Quantity   Inventory
AAA    8          8
AAA    -1         7 
AAA    -5         2
AAA    10         12
AAA    -3         9
BBB    7          7 
BBB    -3         4
BBB    -3         1
BBB    5          6
BBB    -3         3

Which is all good, except that the first value of each 'Part' group in the 'Quantity column is replaced by the first value in the 'Inventory' column.

Any suggestions on how to avoid this and keep the initial value in the 'Quantity' column?


Solution

  • Rewrite your logic to avoid modifying the original column:

    mask = ~df['Inventory'].isna()
    group = mask.cumsum()
    
    df.loc[~mask, 'Inventory'] = (df['Inventory'].fillna(df['Quantity'])
                                 .groupby(group).cumsum())
    

    Output:

      Part  Quantity  Inventory
    0  AAA        -1        8.0
    1  AAA        -1        7.0
    2  AAA        -5        2.0
    3  AAA        10       12.0
    4  AAA        -3        9.0
    5  BBB         5        8.0
    6  BBB        -3        5.0
    7  BBB        -3        2.0
    8  BBB         5        7.0
    9  BBB        -3        4.0