Looking for a solution to my problem I found an old post (Cross correlation of different time series data values in R) which asks exactly for what I need but unfortunately It didnt get any answer so I will ask again hoping for some guidance.
I have created a big matrix from a big number of time series with the same size, each column is a different time serie (something similar to the following but much bigger and much more values different than zero):
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19]
[1,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA NA NA 0.0 NA 0.0 0.0 0.0 0.0
[2,] 0 6.0 0.0 9.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[3,] 0 0.0 0.0 5.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[4,] 0 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[5,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[6,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[7,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[8,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 0.0 NA 0.0 0.0 0.0 0.0
[9,] 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 NA 0 NA 10.0 NA 0.0 0.0 0.0 0.0
.
.
.
I want to determine the correlation between all the time series, I put them in a matrix because I thought it could be the best way to do a cross-correlation procedure, I might be wrong.
So, I also know about the functions "ccf" and "diss()":
but like in the old post, I have the same issues:
So the question is how do we compute and perform cross-correlation between different time-series in R?
ccf
returns the pairwise correlation at each offset (i.e. lag) but I think what you want is the max(abs(correlation) from that. Because you have NA's you need to set the na.action
argument.
mat <- matrix(rnorm(100000), ncol=100)
mat[sample(1:length(mat), 100)] <- NA
res <- sapply(1:ncol(mat), function(x) {
sapply(1:ncol(mat), function(z){
resTmp <- ccf(x = mat[, x], y = mat[, z], plot=F, na.action = na.pass)
resTmp$acf[which.max(abs(resTmp$acf))]
})
})
From the ccf
help:
By default, no missing values are allowed. If the na.action function passes through missing values (as na.pass does), the covariances are computed from the complete cases. This means that the estimate computed may well not be a valid autocorrelation sequence, and may contain missing values.