Search code examples
pythonnumpycross-correlation

How to find the lag between two time series using cross-correlation


Say the two series are:

x = [4,4,4,4,6,8,10,8,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4]
y = [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,6,8,10,8,6,4,4]

Series x clearly lags y by 12 time periods. However, using the following code as suggested in Python cross correlation:

import numpy as np
c = np.correlate(x, y, "full")
lag = np.argmax(c) - c.size/2

leads to an incorrect lag of -0.5.
What's wrong here?


Solution

  • If you want to do it the easy way you should simply use scipy correlation_lags

    Also, remember to subtract the mean from the inputs.

    import numpy as np
    from scipy import signal
    x = [4,4,4,4,6,8,10,8,6,4,4,4,4,4,4,4,4,4,4,4,4,4,4]
    y = [4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,4,6,8,10,8,6,4,4]
    correlation = signal.correlate(x-np.mean(x), y - np.mean(y), mode="full")
    lags = signal.correlation_lags(len(x), len(y), mode="full")
    lag = lags[np.argmax(abs(correlation))]
    

    This gives lag=-12, that is the difference between the index of the first six in x and in y, if you swap inputs it gives +12

    Edit

    Why to subtract the mean

    If the signals have non-zero mean the terms at the center of the correlation will become larger, because there you have a larger support sample to compute the correlation. Furthermore, for very large data, subtracting the mean makes the calculations more accurate.

    Here I illustrate what would happen if the mean was not subtracted for this example.

    plt.plot(abs(correlation))
    plt.plot(abs(signal.correlate(x, y, mode="full")))
    plt.plot(abs(signal.correlate(np.ones_like(x)*np.mean(x), np.ones_like(y)*np.mean(y))))
    plt.legend(['subtracting mean', 'constant signal', 'keeping the mean'])
    

    enter image description here

    Notice that the maximum on the blue curve (at 10) does not coincide with the maximum of the orange curve.