Search code examples
pythonpython-3.xnumpycorrelation

Correlation between zeros in a zero-one sequence


I have a sequence of zeros and ones for example, [0,1,0,0,0,1,0]. I want to measure the correlation between zeros in the sequence i.e., given one zero how likely is another zero to follow the first zero. I wanted to do this using the correlation coefficient. However, if I use numpy.corrcoef() in the sequence, it returns 1.0 which is not true. Any suggestions are appreciated. Here's a code that reproduces the same:

    import numpy as np

    x = np.random.randint(0,2,1000)
    x = x[...,np.newaxis]
    rho = np.corrcoef(x.T)
    print(rho)


Solution

  • You need to compare x to its shifted self:

    np.random.seed(0)
    
    x = np.random.randint(0, 2, 1000)
    
    rho = np.corrcoef(x[:-1], x[1:])
    

    Output:

    array([[1.        , 0.00292208],  # <- this is the value you want
           [0.00292208, 1.        ]])
    

    how it works

    We compare each value to the next one:

    x 
    # array([0, 1, 1, 0, 1, ...,  0, 0, 1, 0])
    
    # first to second-to-last value
    x[:-1]
    # array([0, 1, 1, 0, 1, ...,  0, 0, 1])
    
    # second to last value
    x[1:]
    # array([1, 1, 0, 1, ...,  0, 0, 1, 0])