Search code examples
pythonpandascumsum

Perform cumulative sum over a column but reset to 0 if sum become negative in Pandas


I have a pandas dataframe with two columns like this,

Item    Value
0   A   7
1   A   2
2   A   -6
3   A   -70
4   A   8
5   A   0

I want to cumulative sum over the column, Value. But while creating the cumulative sum if the value becomes negative I want to reset it back to 0.

I am currently using a loop shown below to perform this,

sum_ = 0
cumsum = []

for val in sample['Value'].values:
    sum_ += val
    if sum_ < 0:
        sum_ = 0
    cumsum.append(sum_)

print(cumsum) # [7, 9, 3, 0, 8, 8]

I am looking for a more efficient way to perform this in pure pandas.


Solution

  • This can be done using numpy but is slower than the numba solution.

    sumlm = np.frompyfunc(lambda a,b: 0 if a+b < 0 else a+b,2,1)
    newx=sumlm.accumulate(df.Value.values, dtype=np.object)
    newx
    Out[147]: array([7, 9, 3, 0, 8, 8], dtype=object)
    

    Here is the numba solution

    from numba import njit
    @njit
    def cumli(x, lim):
        total = 0
        result = []
        for i, y in enumerate(x):
            total += y
            if total < lim:
                total = 0
            result.append(total)
        return result
    cumli(df.Value.values,0)
    Out[166]: [7, 9, 3, 0, 8, 8]