Search code examples
rtime-seriescross-correlation

Cross correlation of different time series in a matrix


Looking for a solution to my problem I found an old post (Cross correlation of different time series data values in R) which asks exactly for what I need but unfortunately It didnt get any answer so I will ask again hoping for some guidance.

I have created a big matrix from a big number of time series with the same size, each column is a different time serie (something similar to the following but much bigger and much more values different than zero):

      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
[1,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA    NA    NA   0.0    NA   0.0   0.0   0.0   0.0
[2,]    0   6.0   0.0   9.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[3,]    0   0.0   0.0   5.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[4,]    0   0.0   0.0  10.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[5,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[6,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[7,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[8,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA   0.0    NA   0.0   0.0   0.0   0.0
[9,]    0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0   0.0    NA     0    NA  10.0    NA   0.0   0.0   0.0   0.0
.
.
.

I want to determine the correlation between all the time series, I put them in a matrix because I thought it could be the best way to do a cross-correlation procedure, I might be wrong.

So, I also know about the functions "ccf" and "diss()":

  1. ccf() #in base packages
  2. diss(meter_daywise,METHOD = "CORT",deltamethod = "DTW")#in TSclust package

but like in the old post, I have the same issues:

  1. ccf do not take full matrix as input
  2. diss() takes input matrix and produces some matrix, but while observing the values I find that it is not a cross-correlation matrix because the values are not between -1 and 1.

So the question is how do we compute and perform cross-correlation between different time-series in R?


Solution

  • ccf returns the pairwise correlation at each offset (i.e. lag) but I think what you want is the max(abs(correlation) from that. Because you have NA's you need to set the na.action argument.

    mat <- matrix(rnorm(100000), ncol=100)
    mat[sample(1:length(mat), 100)] <- NA 
    
    res <- sapply(1:ncol(mat), function(x) {
      sapply(1:ncol(mat), function(z){
        resTmp <- ccf(x = mat[, x], y = mat[, z], plot=F, na.action = na.pass)
        resTmp$acf[which.max(abs(resTmp$acf))]
      })
    })
    

    From the ccf help:

    By default, no missing values are allowed. If the na.action function passes through missing values (as na.pass does), the covariances are computed from the complete cases. This means that the estimate computed may well not be a valid autocorrelation sequence, and may contain missing values.