Search code examples
pythoncountintervals

Python: count number of uninterrupded intervals


Consider an array Y of 0s and 1s. For example: Y = (0,1,1,0). I want to count the number of uninterrupted intervals of the 0s and 1s. In our example n0 = 2 and n1 = 1. I have a script which does the needed. It is not very elegant though. Does someone know a smoother or more pythonic version?

import pandas as pd
import numpy as np

# storage
counter = {}

# number of random draws
n = 10

# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))

# where are the 0s and 1s
idx_0 = Y[Y[0] == 0].index
idx_1 = Y[Y[0] == 1].index

# count intervals of uninterrupted 0s
j = 0
for i in idx_0:
    if i+1 < n:
        if Y.loc[i+1, 0] == 1:
            j += 1
        else:
            continue

if Y.loc[n-1, 0] == 0:
    j += 1


counter['n_0'] = j

# count intervals of uninterrupted 1s
j = 0
for i in idx_1:
    if i+1 < n:
        if Y.loc[i+1, 0] == 0:
            j += 1
        else:
            continue

if Y.loc[n-1, 0] == 1:
    j += 1

counter['n_1'] = j

Solution

  • A more succinct solution taking advantage of pandas methods:

    counter = Y[0][Y[0].diff() != 0].value_counts()
    
    • Y[0].diff() counts the difference between consecutive elements
    • diff != 0 marks the indices where the value changes
    • Y[idx].value_counts() counts the frequency of each value

    Example result for 10 random elements, [0, 1, 1, 0, 1, 1, 1, 1, 1, 1]:

    1    2
    0    2
    Name: 0, dtype: int64
    

    If you insist having the keys as 'n_0' and 'n_1' instead, you can rename them with

    counter = counter.rename(index={i: f'n_{i}' for i in range(2)})
    

    You can also convert that to a dict with dict(counter), even though the pandas object has the same functionality with counter[key] giving you the respective value.