Pandas dataframe, a cumsum calculation including max function

I'm sitting with a pandas dataframe and I have a time series problem where I have some values called diff. I need to calculate a value, here called sum, according to the below formula for each category separately:

sum_n = max(0, diff_n + sum_n-1 - factor)

factor = 2 (factor is a parameter and in this example set to 2)

The dataframe looks something like this and the value of sum is set to 0 for hour = 0:

category	hour	diff	sum
a	0	0	0
a	1	4	NaN
a	2	3	NaN
a	3	1	NaN
b	0	0	0
b	1	1	NaN
b	2	-5	NaN
b	3	4	NaN

My expected output is the following:

category	hour	diff	sum
a	0	0	0
a	1	4	2
a	2	3	3
a	3	1	2
b	0	0	0
b	1	1	0
b	2	-5	0
b	3	4	2

Any idea how to solve this? Preferably without iterrows or any for loops since there are a lot of rows.

Would be happy for any help here.

If it would have been without the max function I could have used something like this:

df['sum'] = df.groupby(['category'])['diff'].cumsum() - factor

But the max function messes things up for me.

Solution

You can use the following lambda function:

sumn = 0
def calc_sum(df):
    global sumn
    if not df['hour']: # Reset when hour=0
        sumn = 0
    sumn = max(0, df['diff'] + sumn - 2)
    return sumn

df['sum'] = df.groupby(['category']).apply(lambda df: df.apply(calc_sum, axis=1)).values

Output: