Search code examples
rcross-correlationrollapply

Using ccf() with rollapply across variables in R


I have 3 columns of data with 10 rows in each column as below

set.seed(101)
inputx <- rnorm(1000,mean = 3,sd=2)
inputy <- rnorm(1000,mean = 2,sd=1)
inputz <- rnorm(1000,mean = 1,sd=3)
example <- cbind(inputx,inputy,inputz)

    > head(example,10)
        inputx      inputy     inputz
 [1,] 2.347927  2.50319581  4.4931430
 [2,] 4.104924 -0.09747067 -0.2836938
 [3,] 1.650112  1.90551542  0.9146087
 [4,] 3.428719  3.01454241  4.9332929
 [5,] 3.621538  1.92059955  2.4220865
 [6,] 5.347933  1.74487106  2.9122764
 [7,] 4.237580  2.78095054  7.8622898
 [8,] 2.774531  3.20741266 -1.5977934
 [9,] 4.834057  1.09214734 -0.5482315
[10,] 2.553481  0.59679215  0.5285020

My actual data has 10 variables in total but for simplicity of this example I just used 3.

For each permutation as per below I would like to calculate the ccf using a rolling window of size 4

    inputx,inputx
    inputx,inputy
    inputx,inputz
    inputy,inputx
    inputy,inputy
    inputy,inputz
    inputz,inputx
    inputz,inputy
    inputz,inputz

e.g

ccf(example[1:4,1],example[1:4,2]) 
ccf(example[2:5,1],example[2:5,2])
.
.
.
ccf(example[7:10,1],example[7:10,2])

As you can see this sample above just works on columns 1 and 2 but I hope to do it for all columns using the rolling window approach.

The rolling window procedure is easily handled using the rollapply function where you specify the size of the window and by specifying by.column as FALSE so that it doesn't apply the function on each column separately.

If I wanted to roll apply a function such as mean to each column I could do that but the looping across columns for all combinations just blows my mind and I can't figure it out.

test <- rollapply(example[,c(1,2)],4,mean, by.column=TRUE)

For the output the ccf results should be stored by row and should be of dimensions 10x9 since there's 10 rows in the original data and the 9 permutations of the ccf function. The first 3 rows of the output will be NA since the rolling window uses a size of 4. The values below are just for illustrative purposes and are not the real output values.

output ->

     xx  xy  xz  yx  yy  yz  zx  zy  zz
[1,] NA  NA  NA  NA  NA  NA  NA  NA  NA  
[2,] NA  NA  NA  NA  NA  NA  NA  NA  NA 
[3,] NA  NA  NA  NA  NA  NA  NA  NA  NA  
[4,] .1  .2  .3  .2  .8  .5  .3  .5  .9 
[5,] .1  .2  .3  .2  .8  .5  .3  .5  .9  
[6,] .1  .2  .3  .2  .8  .5  .3  .5  .9 
[7,] .1  .2  .3  .2  .8  .5  .3  .5  .9  
[8,] .1  .2  .3  .2  .8  .5  .3  .5  .9 
[9,] .1  .2  .3  .2  .8  .5  .3  .5  .9  
[10,].1  .2  .3  .2  .8  .5  .3  .5  .9 

I would appreciate a little help in applying the looping across all permutations, I think if I got that I could then do the rollapply wrapper to implement the sliding window.


Solution

  • 1) ccf(x, y) of two 4-vectors x and y gives a 7-vector so the output would have 3 * 3 * 7 = 63 columns, not 9, as stated in the question.

    In a comment the poster stated that another function could be substituted for ccf so below we assume cov(x, y) since that outputs a scalar rather than a 7-vector and so would give a 10 x 9 output. In this particular case cov(cbind(x, y, z)) produces a 3x3 matrix which when flattened gives a 9-vector.

    rollapplyr(head(example, 10), 4, function(x) c(cov(x)), fill = NA, by.column = FALSE)
    

    giving the following 10x9 matrix:

               [,1]        [,2]       [,3]        [,4]      [,5]       [,6]       [,7]       [,8]      [,9]
     [1,]        NA          NA         NA          NA        NA         NA         NA         NA        NA
     [2,]        NA          NA         NA          NA        NA         NA         NA         NA        NA
     [3,]        NA          NA         NA          NA        NA         NA         NA         NA        NA
     [4,] 1.1990739 -0.72070179 -0.3951435 -0.72070179 1.8590569  3.1565993 -0.3951435  3.1565993  6.718376
     [5,] 1.1503463 -0.51712419  0.1548365 -0.51712419 1.6830055  2.6102211  0.1548365  2.6102211  5.058550
     [6,] 2.2854029 -0.12857123  1.1658204 -0.12857123 0.3413027  0.7821381  1.1658204  0.7821381  2.753662
     [7,] 0.7473036 -0.31336885 -0.2743693 -0.31336885 0.3923239  1.1959920 -0.2743693  1.1959920  6.109035
     [8,] 1.1727627 -0.53344663  2.2960862 -0.53344663 0.4851109 -0.5067012  2.2960862 -0.5067012 15.027672
     [9,] 1.2381071 -0.88053417  1.5728089 -0.88053417 0.9289009  0.7283704  1.5728089  0.7283704 18.179175
    [10,] 1.2353345 -0.05021654  1.7008923 -0.05021654 1.6116281  1.4902571  1.7008923  1.4902571 18.399713
    

    2) or this which gives the same result:

    k <- ncol(example)
    g <- expand.grid(1:k, 1:k)
    Cov <- function(x) apply(g, 1, function(ix) cov(x[, ix[1]], x[, ix[2]]))
    rollapplyr(head(example, 10), 4, Cov, by.column = FALSE, fill = NA)
    

    Note that in the case of cov it produces a symmetric matrix (the 3 lower triangular values equal the 3 upper triangular values) so we might only wish to output the diagonal and upper triangular part and if that is the case we could use upper.tri to subset it.