I have two-time series datasets i.e. errors received and bookings received on a daily basis for three years (a few million rows). I wish to find if there is any relationship between them.As of now, I think that cross-correlation between these two series might help. I order to so, should I perform any transformations like stationarity, detrending, deseasonality, etc. If this is correct, I'm thinking of using "scipy.signal.correlate¶" but really want to know how to interpret the result?
scipy.signal.correlate
is for the correlation of time series. For series y1
and y2
, correlate(y1, y2)
returns a vector that represents the time-dependent correlation: the k-th value represents the correlation with a time lag of "k - N + 1", so that the N+1 th element is the similarity of the time series without time lag: close to one if y1 and y2 have similar trends (for normalized data), close to zero if the series are independent.
numpy.corrcoef
takes two arrays and aggregates the correlation in a single value (the "time 0" of the other routine), the Pearson correlation, and does so for N rows, returning a NxN array of correlations. corrcoef
normalizes the data (divides the results by their rms value), so that he diagonal is supposed to be 1 (average self correlation).
The questions about stationarity, detrending, and deseasonality depend on your specific problem. The routines above consider "plain" data without consideration for their signification.