Search code examples
python-3.xpandascumsum

Year wise cumsum based on condition on other column Python 3+ Pandas data frame


I have a dataframe with three columns as Year, Price, PV. I want a cumulative sum of column PV getting reset as per Year column, if values in Price column not equal to zero.

df = pd.DataFrame({"Year": [2000] * 3 + [2001] * 3,
                   "Value": [0,100,0,0,100,100],
                   "PV": [7,8,9,12,13,14],
                   'Cumsum':[0,15,0,0,25,39]})

print (df)
   Year  Value  PV  Cumsum
0  2000      0   7       0
1  2000    100   8      15
2  2000      0   9       0
3  2001      0  12       0
4  2001    100  13      25
5  2001    100  14      39

Solution

  • Use numpy.where with DataFrameGroupBy.cumsum:

    df['Cumsum'] = np.where(df['Value'] == 0, 0, df.groupby('Year')['PV'].cumsum())
    print (df)
       Year  Value  PV  Cumsum
    0  2000      0   7       0
    1  2000    100   8      15
    2  2000      0   9       0
    3  2001      0  12       0
    4  2001    100  13      25
    5  2001    100  14      39