Search code examples
rtime-seriesxtscross-correlation

R: how to format data before applying cross-correlation


I want to do a cross correlation on two time series data sets. The data sets looks as below.

> head(link420)
                  time diff420
1: 2018-01-01 08:00:00   18.50
2: 2018-01-01 08:05:00    0.00
3: 2018-01-01 08:10:00   -4.25
4: 2018-01-01 08:15:00    4.25
5: 2018-01-01 08:20:00   -8.50
6: 2018-01-01 08:25:00   47.00


> head(link423)
                  time    diff423
1: 2018-01-01 08:00:00   0.000000
2: 2018-01-01 08:05:00   1.700000
3: 2018-01-01 08:10:00 -22.818182
4: 2018-01-01 08:15:00  23.272727
5: 2018-01-01 08:20:00   4.160839
6: 2018-01-01 08:25:00  -9.337607

and the format of these two data sets are

> str(link420)
Classes ‘data.table’ and 'data.frame':  31 obs. of  2 variables:
 $ time   : POSIXct, format: "2018-01-01 08:00:00" "2018-01-01 08:05:00" "2018-01-01 08:10:00" "2018-01-01 08:15:00" ...
 $ diff420: num  18.5 0 -4.25 4.25 -8.5 47 -20 4 -5 -27 ...
 - attr(*, ".internal.selfref")=<externalptr> 


> str(link423)
Classes ‘data.table’ and 'data.frame':  31 obs. of  2 variables:
 $ time   : POSIXct, format: "2018-01-01 08:00:00" "2018-01-01 08:05:00" "2018-01-01 08:10:00" "2018-01-01 08:15:00" ...
 $ diff423: num  0 1.7 -22.82 23.27 4.16 ...
 - attr(*, ".internal.selfref")=<externalptr> 

How should I change these data formats?

when I try
ccf(link420,link433)

it returns an error

Error in dimnames(x) <- dn : 
  length of 'dimnames' [2] not equal to array extent

So I tried

link420<-xts(x = link420, order.by = link420$time)
link420<-link420[,c(2)]

link423<-xts(x = link423, order.by = link423$time)
link423<-link423[,c(2)]


but still gives an error

ccf(link420,link433)
Error in ccf(link420, link433) : univariate time series only

I want to find out at what time period (in 5 minute interval )these two data sets show correlations. Can I get some help?


Solution

  • You might find it easier to use tsibble objects like this.

    library(tidyverse)
    library(lubridate)
    library(tsibble)
    library(feasts)
    
    link420 <- tibble(
        time = seq(as.POSIXct("2018-01-01 08:00:00"), length=100, by="5 min"),
        diff420 = rnorm(100)
      ) %>%
      as_tsibble(index=time)
    link423 <- tibble(
        time = seq(as.POSIXct("2018-01-01 08:00:00"),length=100, by="5 min"),
        diff423 = rnorm(100)
      ) %>%
      as_tsibble(index=time)
    
    inner_join(link420, link423, by = "time") %>%
      CCF(diff420, diff423)
    #> # A tsibble: 33 x 2 [5m]
    #>      lag     ccf
    #>    <lag>   <dbl>
    #>  1  -80m  0.0648
    #>  2  -75m -0.0651
    #>  3  -70m -0.0316
    #>  4  -65m  0.0679
    #>  5  -60m  0.0635
    #>  6  -55m -0.158 
    #>  7  -50m  0.0444
    #>  8  -45m  0.0497
    #>  9  -40m  0.0267
    #> 10  -35m -0.0503
    #> # … with 23 more rows
    

    Created on 2020-11-02 by the reprex package (v0.3.0)