I have a sequence of zeros and ones for example, [0,1,0,0,0,1,0]. I want to measure the correlation between zeros in the sequence i.e., given one zero how likely is another zero to follow the first zero. I wanted to do this using the correlation coefficient. However, if I use numpy.corrcoef() in the sequence, it returns 1.0 which is not true. Any suggestions are appreciated. Here's a code that reproduces the same:
import numpy as np
x = np.random.randint(0,2,1000)
x = x[...,np.newaxis]
rho = np.corrcoef(x.T)
print(rho)
You need to compare x
to its shifted self:
np.random.seed(0)
x = np.random.randint(0, 2, 1000)
rho = np.corrcoef(x[:-1], x[1:])
Output:
array([[1. , 0.00292208], # <- this is the value you want
[0.00292208, 1. ]])
We compare each value to the next one:
x
# array([0, 1, 1, 0, 1, ..., 0, 0, 1, 0])
# first to second-to-last value
x[:-1]
# array([0, 1, 1, 0, 1, ..., 0, 0, 1])
# second to last value
x[1:]
# array([1, 1, 0, 1, ..., 0, 0, 1, 0])