Search code examples
pythonpandasnumpycumsum

Numpy array counter with reset


I have a numpy array with only -1, 1 and 0, like this:

np.array([1,1,-1,-1,0,-1,1])

I would like a new array that counts the -1 encountered. The counter must reset when a 0 appears and remain the same when it's a 1:

Desired output:

np.array([0,0,1,2,0,1,1])

The solution must be very little time consuming when used with larger array (up to 100 000)


Edit: Thanks for your contribution, I've a working solution for now.

I'm still looking for a non-iterative way to solve it (no for loop). Maybe with a pandas Series and the cumsum() method ?


Solution

  • Maybe with a pandas Series and the cumsum() method?

    Yes, use Series.cumsum and Series.groupby:

    s = pd.Series([1, 1, -1, -1, 0, -1, 1])
    
    s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
    # array([0, 0, 1, 2, 0, 1, 1])
    

    Step-by-step

    1. Create pseudo-groups that reset when equal to 0:

      groups = s.eq(0).cumsum()
      # array([0, 0, 0, 0, 1, 1, 1])
      
    2. Then groupby these pseudo-groups and cumsum when equal to -1:

      s.eq(-1).groupby(groups).cumsum().to_numpy()
      # array([0, 0, 1, 2, 0, 1, 1])
      

    Timings

    not time consuming when used with larger array (up to 100,000)

    groupby + cumsum is ~8x faster than looping, given np.random.choice([-1, 0, 1], size=100_000):

    %timeit series_cumsum(a)
    # 3.29 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
    
    %timeit miki_loop(a)
    # 26.5 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    
    %timeit skyrider_loop(a)
    # 26.8 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)