Consider an array Y of 0s and 1s. For example: Y = (0,1,1,0). I want to count the number of uninterrupted intervals of the 0s and 1s. In our example n0 = 2 and n1 = 1. I have a script which does the needed. It is not very elegant though. Does someone know a smoother or more pythonic version?
import pandas as pd
import numpy as np
# storage
counter = {}
# number of random draws
n = 10
# dataframe of random draw between 0 and 1
Y = pd.DataFrame(np.random.choice(2, n))
# where are the 0s and 1s
idx_0 = Y[Y[0] == 0].index
idx_1 = Y[Y[0] == 1].index
# count intervals of uninterrupted 0s
j = 0
for i in idx_0:
if i+1 < n:
if Y.loc[i+1, 0] == 1:
j += 1
else:
continue
if Y.loc[n-1, 0] == 0:
j += 1
counter['n_0'] = j
# count intervals of uninterrupted 1s
j = 0
for i in idx_1:
if i+1 < n:
if Y.loc[i+1, 0] == 0:
j += 1
else:
continue
if Y.loc[n-1, 0] == 1:
j += 1
counter['n_1'] = j
A more succinct solution taking advantage of pandas methods:
counter = Y[0][Y[0].diff() != 0].value_counts()
Y[0].diff()
counts the difference between consecutive elementsdiff != 0
marks the indices where the value changesY[idx].value_counts()
counts the frequency of each valueExample result for 10 random elements, [0, 1, 1, 0, 1, 1, 1, 1, 1, 1]:
1 2
0 2
Name: 0, dtype: int64
If you insist having the keys as 'n_0' and 'n_1' instead, you can rename them with
counter = counter.rename(index={i: f'n_{i}' for i in range(2)})
You can also convert that to a dict with dict(counter)
, even though the pandas object has the same functionality with counter[key]
giving you the respective value.