Search code examples
arraysnumpyresetcumulative-sum

How to calculate cumulative sums of ones with a reset each time a zero is encountered


I have an array made of 0 and 1. I want to calculate a cumulative sum of all consecutive 1 with a reset each time a 0 is met, using numpy as I have thousands of arrays of thousands of lines and columns.

I can do it with loops but I suspect it will not be efficient. Would you have a smarter and quick way to run it on the array. Here is short example of the input and the expected output:

import numpy as np
arr_in = np.array([[1,1,1,1,1,1], [0,0,0,0,0,0], [1,0,1,0,1,1], [0,1,1,1,0,0]])
print(arr_in)
print("expected result:")
arr_out = np.array([[1,2,3,4,5,6], [0,0,0,0,0,0], [1,0,1,0,1,2], [0,1,2,3,0,0]])
print(arr_out)

When you run it:

[[1 1 1 1 1 1]
 [0 0 0 0 0 0]
 [1 0 1 0 1 1]
 [0 1 1 1 0 0]]
expected result:
[[1 2 3 4 5 6]
 [0 0 0 0 0 0]
 [1 0 1 0 1 2]
 [0 1 2 3 0 0]]

Solution

  • With numba.vectorize you can define a custom numpy ufunc to use for accumulation.

    import numba as nb # v0.56.4, no support for numpy >= 1.22.0
    import numpy as np # v1.21.6
    
    @nb.vectorize([nb.int64(nb.int64, nb.int64)])
    def reset_cumsum(x, y):
        return x + y if y else 0
    
    arr_in = np.array([[1,1,1,1,1,1],
                       [0,0,0,0,0,0],
                       [1,0,1,0,1,1],
                       [0,1,1,1,0,0]])
    
    reset_cumsum.accumulate(arr_in, axis=1)
    

    Output

    array([[1, 2, 3, 4, 5, 6],
           [0, 0, 0, 0, 0, 0],
           [1, 0, 1, 0, 1, 2],
           [0, 1, 2, 3, 0, 0]])