Search code examples
pythonscipytime-seriescross-correlation

How do I perform crosscorelation between two time series and what transformations should I perform in python?


I have two-time series datasets i.e. errors received and bookings received on a daily basis for three years (a few million rows). I wish to find if there is any relationship between them.As of now, I think that cross-correlation between these two series might help. I order to so, should I perform any transformations like stationarity, detrending, deseasonality, etc. If this is correct, I'm thinking of using "scipy.signal.correlate¶" but really want to know how to interpret the result?


Solution

  • scipy.signal.correlate is for the correlation of time series. For series y1 and y2, correlate(y1, y2) returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag: close to one if y1 and y2 have similar trends (for normalized data), close to zero if the series are independent.

    numpy.corrcoef takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine), the Pearson correlation, and does so for N rows, returning a NxN array of correlations. corrcoef normalizes the data (divides the results by their rms value), so that he diagonal is supposed to be 1 (average self correlation).

    The questions about stationarity, detrending, and deseasonality depend on your specific problem. The routines above consider "plain" data without consideration for their signification.