How to identify consecutive repeating values in data frame column?

so I am trying to figure out how I can identify consecutive repeating values in a data frame column in python, and then be able to set a number for how many consecutive repeating values I am looking for. I will explain further here.

I have the following data frame:

DateTime                 Value         
-------------------------------
2015-03-11 06:00:00          1               
2015-03-11 07:00:00          1               
2015-03-11 08:00:00          1               
2015-03-11 09:00:00          1               
2015-03-11 10:00:00          0               
2015-03-11 11:00:00          0               
2015-03-11 12:00:00          0               
2015-03-11 13:00:00          0               
2015-03-11 14:00:00          0               
2015-03-11 15:00:00          0               
...

Now I have the following question: In the "Value" column, is there ever an instance where there are "2" or more consecutive "0" values? Yes! Now I want to return a "True".

Now I have this data frame:

DateTime                 Value         
-------------------------------
2015-03-11 06:00:00          1               
2015-03-11 07:00:00          1               
2015-03-11 08:00:00          0               
2015-03-11 09:00:00          0               
2015-03-11 10:00:00          1               
2015-03-11 11:00:00          0               
2015-03-11 12:00:00          0               
2015-03-11 13:00:00          0               
2015-03-11 14:00:00          1               
2015-03-11 15:00:00          1               
...

Now I have the following question: In the "Value" column, is there ever an instance where there are "3" or more consecutive "0" values? Yes! Now I want to return a "True".

And of course, if the answer is "No", then I would want to return a "False"

How can this be done in python? What is this process even called? How can you set this so that you can change the number of consecutive values being looked for?

Solution

To detect consecutive runs in the series, we first detect the turning points by looking at the locations where difference with previous entry isn't 0. Then cumulative sum of this marks the groups:

# for the second frame
>>> consecutives = df["Value"].diff().ne(0).cumsum()
>>> consecutives

0    1
1    1
2    2
3    2
4    3
5    4
6    4
7    4
8    5
9    5

But since you're interested in a particular value's consecutive runs (e.g., 0), we can mask the above to put NaNs wherever we don't have 0 in the original series:

>>> masked_consecs = consecutives.mask(df["Value"].ne(0))
>>> masked_consecs

0    NaN
1    NaN
2    2.0
3    2.0
4    NaN
5    4.0
6    4.0
7    4.0
8    NaN
9    NaN

Now we can group by this series and look at the groups' sizes:

>>> consec_sizes = df["Value"].groupby(masked_consecs).size().to_numpy()
>>> consec_sizes

array([2, 3])

The final decision can be made with the threshold given (e.g., 2) to see if any of the sizes satisfy that:

>>> is_okay = (consec_sizes >= 2).any()
>>> is_okay
True

Now we can wrap this procedure in a function for reusability:

def is_consec_found(series, value=0, threshold=2):
    # mark consecutive groups
    consecs = series.diff().ne(0).cumsum()

    # disregard those groups that are not of `value`
    masked_consecs = consecs.mask(series.ne(value))

    # get size of each
    consec_sizes = series.groupby(masked_consecs).size().to_numpy()

    # check sizes agains the threshold
    is_okay = (consec_sizes >= threshold).any()

    # whether a suitable sequence is found or not
    return is_okay

and we can run it as:

# these are all for the second dataframe you posted
>>> is_consec_found(df["Value"], value=0, threshold=2)
True

>>> is_consec_found(df["Value"], value=0, threshold=5)
False

>>> is_consec_found(df["Value"], value=1, threshold=2)
True

>>> is_consec_found(df["Value"], value=1, threshold=3)
False