Search code examples
rtime-seriespurrr

How to keep all elements of a list that reach an individual ACF-Cut-off value in R?


I would like to keep all cases in a list that reach an individual cut-off value. The list contains of 650 individual cross correlations from the ccf-function.

str(d_posts_ccf_10_10)
List of 650
 $ Europa_Teles_BTR    :List of 6
  ..$ acf   : num [1:61, 1, 1] 0.01628 -0.00581 -0.04069 -0.16275 0.35689 ...
  ..$ type  : chr "correlation"
  ..$ n.used: int 148
  ..$ lag   : num [1:61, 1, 1] -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 ...
  ..$ series: chr "X"
  ..$ snames: chr ".$Posts_game & .$Posts_stop"
  ..- attr(*, "class")= chr "acf"
 $ Polonpo             :List of 6
  ..$ acf   : num [1:61, 1, 1] -0.05826 0.13355 -0.06989 -0.00596 -0.05827 ...
  ..$ type  : chr "correlation"
  ..$ n.used: int 127
  ..$ lag   : num [1:61, 1, 1] -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 ...
  ..$ series: chr "X"
  ..$ snames: chr ".$Posts_game & .$Posts_stop"
  ..- attr(*, "class")= chr "acf"
 $ derchefz            :List of 6
  ..$ acf   : num [1:61, 1, 1] 0 0.0587 -0.0744 0.2663 -0.268 ...
  ..$ type  : chr "correlation"
  ..$ n.used: int 143
  ..$ lag   : num [1:61, 1, 1] -30 -29 -28 -27 -26 -25 -24 -23 -22 -21 ...
  ..$ series: chr "X"
  ..$ snames: chr ".$Posts_game & .$Posts_stop"
  ..- attr(*, "class")= chr "acf"

Every case has its own used observations. I am interested in the ACF-values and I would like to keep all cases, where at least one ACF-value is "±2/√T where T is the length of the time series" ( I guess n.used). Reason for this procedure is that I would like to get all significant lags without the visual inspection of the ACF plot, since its about 650 cases. Really appreciate some help or advise on this one!

        library(purrr)
    
    test_499 <- d_posts_ccf_10_10 
    %>% keep(.x$acf < 2/sqrt(x$n.used)) 
    %>% keep(.x$acf > -2/sqrt(x$n.used))

    test_500 <- d_posts_ccf_10_10 %>% map(~ .x$acf) %>%
    keep(function(x) x > 2/sqrt(.x$n.used))

Solution

  • There may be more elegant solutions, but one base R approach may be to first establish your list-specific cutoff values, then combine lapply and sapply to create a boolean vector indicating if any values fall within the threshold.

    Its difficult to ensure this will work with your exact data without reproducible code, but if your data look like this (where the second one does not meet the criteria and should be removed, but the first and third should be kept):

    have_list <- list(list1 = list(n.used = 123,
                                   acf = c(-1, 0.2, 0.3, 12),
                                   ignore = LETTERS),
                      list2 = list(n.used = 321,
                                   acf = seq(10, 20, 0.1),
                                   ignore = letters),
                      list3 = list(n.used = 111,
                                   acf = seq(-1, 1, 0.01),
                                   ignore = 1:26))
    

    You can can try the above described approach like this:

    # create cutoffs
    cutoffs <- unlist(lapply(have_list, function(x) 2 / sqrt(x[["n.used"]])))
    
    #     list1     list2     list3 
    # 0.1803339 0.1116291 0.1898316 
    
    # create keep index
    keep_index <- unlist(lapply(have_list, function(x) {
      any(sapply(seq_along(have_list), function(y) {
        cutoffs[y] >= min(x[["acf"]]) & cutoffs[y] <= max(x[["acf"]])
        # Or for dplyr
        # dplyr::between(cutoffs[y], min(x[["acf"]]), max(x[["acf"]]))
      }))
    }))
    
    
    # list1 list2 list3 
    #  TRUE FALSE  TRUE 
    
    new_list <- have_list[keep_index]
    # keeps only list1 and list3