Search code examples
fable-rtidyverts

How to select large number of variables to pass to fable::VAR


Why doesn't using the : operator when passing variables to the fable::VAR function work? I have a large number of variables and using the , separator inside vars works just fine, but when I switch to the : separator, it breaks. Is there an alternative way to specify a large number of variables without having to list them each individually using the , separator?

See example code:

library(fable)
library(tidyverse)

lung_deaths <- cbind(mdeaths, fdeaths) %>%
  as_tsibble(pivot_longer = FALSE)

lung_deaths %>%
  model(VAR(vars(mdeaths:fdeaths) ~ AR(1))) %>% 
  tidy()

This produces the following warning:

Warning message:
In mdeaths:fdeaths :
  numerical expression has 72 elements: only the first used

And the output makes no sense:

# A tibble: 2 × 7
  .model                             term                   .response       estimate std.error statistic p.value
  <chr>                              <chr>                  <chr>              <dbl>     <dbl>     <dbl>   <dbl>
1 VAR(vars(mdeaths:fdeaths) ~ AR(1)) lag(mdeaths:fdeaths,1) mdeaths:fdeaths     1.00  9.52e-17   1.05e16       0
2 VAR(vars(mdeaths:fdeaths) ~ AR(1)) constant    

Compare with the correct result using vars(mdeaths,fdeaths):

# A tibble: 6 × 7
  .model                              term           .response estimate std.error statistic p.value
  <chr>                               <chr>          <chr>        <dbl>     <dbl>     <dbl>   <dbl>
1 VAR(vars(mdeaths, fdeaths) ~ AR(1)) lag(mdeaths,1) mdeaths     0.872      0.362    2.41   0.0185 
2 VAR(vars(mdeaths, fdeaths) ~ AR(1)) lag(fdeaths,1) mdeaths    -0.281      0.871   -0.322  0.748  
3 VAR(vars(mdeaths, fdeaths) ~ AR(1)) constant       mdeaths   337.       126.       2.68   0.00925
4 VAR(vars(mdeaths, fdeaths) ~ AR(1)) lag(mdeaths,1) fdeaths     0.299      0.150    2.00   0.0500 
5 VAR(vars(mdeaths, fdeaths) ~ AR(1)) lag(fdeaths,1) fdeaths     0.0253     0.361    0.0700 0.944  
6 VAR(vars(mdeaths, fdeaths) ~ AR(1)) constant       fdeaths    93.5       52.2      1.79   0.0776 

Solution

  • Currently you would need to construct the formula's lhs yourself, which can be done programmatically. There are plans to allow for tidyselect style selectors like mdeaths:fdeaths, with a working version here: https://github.com/tidyverts/fabletools/pull/361

    Here's how you could currently select multiple variables for the model response variables, and construct the formula from them programmatically.

    library(fable)
    #> Loading required package: fabletools
    library(tidyverse)
    
    lung_deaths <- cbind(mdeaths, fdeaths) %>%
      as_tsibble(pivot_longer = FALSE)
    
    # Select the columns you want with tidyselect syntax
    cols <- tidyselect::eval_select(expr(mdeaths:fdeaths), data = lung_deaths)
    
    # Create the formula with lhs and rhs
    fm <- rlang::new_formula(
      lhs = rlang::call2("vars", !!!syms(names(cols))), 
      rhs = expr(AR(1))
    )
    
    # Use formula in model
    lung_deaths %>%
      model(VAR(fm)) %>% 
      tidy()
    #> # A tibble: 6 × 7
    #>   .model  term           .response estimate std.error statistic p.value
    #>   <chr>   <chr>          <chr>        <dbl>     <dbl>     <dbl>   <dbl>
    #> 1 VAR(fm) lag(mdeaths,1) mdeaths     0.872      0.362    2.41   0.0185 
    #> 2 VAR(fm) lag(fdeaths,1) mdeaths    -0.281      0.871   -0.322  0.748  
    #> 3 VAR(fm) constant       mdeaths   337.       126.       2.68   0.00925
    #> 4 VAR(fm) lag(mdeaths,1) fdeaths     0.299      0.150    2.00   0.0500 
    #> 5 VAR(fm) lag(fdeaths,1) fdeaths     0.0253     0.361    0.0700 0.944  
    #> 6 VAR(fm) constant       fdeaths    93.5       52.2      1.79   0.0776
    

    Created on 2023-12-30 with reprex v2.0.2