I have a numpy array with only -1, 1 and 0, like this:
np.array([1,1,-1,-1,0,-1,1])
I would like a new array that counts the -1 encountered. The counter must reset when a 0 appears and remain the same when it's a 1:
Desired output:
np.array([0,0,1,2,0,1,1])
The solution must be very little time consuming when used with larger array (up to 100 000)
Edit: Thanks for your contribution, I've a working solution for now.
I'm still looking for a non-iterative way to solve it (no for
loop). Maybe with a pandas Series and the cumsum()
method ?
Maybe with a pandas Series and the
cumsum()
method?
Yes, use Series.cumsum
and Series.groupby
:
s = pd.Series([1, 1, -1, -1, 0, -1, 1])
s.eq(-1).groupby(s.eq(0).cumsum()).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])
Create pseudo-groups that reset when equal to 0:
groups = s.eq(0).cumsum()
# array([0, 0, 0, 0, 1, 1, 1])
Then groupby
these pseudo-groups and cumsum
when equal to -1:
s.eq(-1).groupby(groups).cumsum().to_numpy()
# array([0, 0, 1, 2, 0, 1, 1])
not time consuming when used with larger array (up to 100,000)
groupby
+ cumsum
is ~8x faster than looping, given np.random.choice([-1, 0, 1], size=100_000)
:
%timeit series_cumsum(a)
# 3.29 ms ± 721 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit miki_loop(a)
# 26.5 ms ± 925 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit skyrider_loop(a)
# 26.8 ms ± 1.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)