Iterate over numbers in a column and get the first row number where the n amount of repetition starts

In my dataset I need to find when there are number of 0's repeated more than 280 times and get back the first row number of the start of the repetition in an array.I'm using python 3.11.

Sample data:

differences

or to create a sample dataset:

   ACD=[0,5]

   df2 = pd.DataFrame(np.repeat(ACD, 100, axis=0))
   df3=df2.sample(frac=1,axis=1).sample(frac=1).reset_index(drop=True)

So far my code:

c=[]
for values,row in df.loc[:, ['differences']].iterrows():
        i=0
        while row['differences']  == 0:
            count = sum(1 for i in row)
            i +=1
            if count > 280:
                continue
            c.append(np.where(row['differences']))
        else:
            values+=1

Expected output:

row_number_rep= [5,90,120] #showing the specific row numbers where the repetition stars.

With this code I get the error

<stdin>:8: DeprecationWarning: Calling nonzero on 0d arrays is deprecated, as it behaves surprisingly. Use atleast_1d(arr).nonzero() if the old behavior was intended.

I need help to improve this code. I think the problem is that I do not have first 280 items as 0's and I need to keep iterating over the whole column to find all the row number of beginning of 0's repeated 280 times.

Solution

pandas approach

Assuming this example and a threshold of 4 (instead of 280):

df = pd.DataFrame({'differences': [0,0,0,0,0,1,2,0,3,0,0,0,0,0,0,4,0,5]})

    differences
0             0  # 0: first stretch of >4
1             0
2             0
3             0
4             0
5             1
6             2
7             0
8             3
9             0  # 9: second stretch of >4
10            0
11            0
12            0
13            0
14            0
15            4
16            0
17            5

You could perform a groupby.size to filter the groupby.first:

thresh = 4

m = df['differences'].eq(0)
group = (~m).cumsum().to_numpy()

g = df.reset_index()[m].groupby(group[m])
g.size()

out = g['index'].first()[g.size()>thresh].to_numpy()

Output: array([ 0, 9])

loop approach

lst = [0, 0, 0, 0, 0, 1, 2, 0, 3, 0, 0, 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 0, 0]
thresh = 4

start = -1
zeros = False
count = 0
out = []
for i, v in enumerate(lst+[-1]):
    if v==0:
        if not zeros:
            count = 0
            start = i
            zeros = True
        count += 1
        continue
    if count > thresh:
        if zeros:
            out.append(start)
    zeros = False

out
# [0, 9, 18]

`itertools.groupby` approach:

from itertools import groupby

lst = [0, 0, 0, 0, 0, 1, 2, 0, 3, 0, 0, 0, 0, 0, 0, 4, 0, 5, 0, 0, 0, 0, 0]
thesh  = 4

out = [x[0][0] for k,g in groupby(enumerate(lst), key=lambda x: x[1]==0)
       if k and len(x:=list(g))>thresh]
# [0, 9, 18]

Iterate over numbers in a column and get the first row number where the n amount of repetition starts

pandas approach

loop approach

itertools.groupby approach:

`itertools.groupby` approach: