Search code examples
rmodeltime-seriestidyverseforecasting

Multivariate time series - is there notation to select all the variables, or do they all have to be written out?


I'm working to build a multivariate time series to make predictions about labor in the United States. The fpp3 package is excellent, but I don't see a notation to model all the variables.

For example, in linear regression, it's possible to do this:

library(tidyverse)
mtcars.lm <-  lm(mpg ~ ., data = mtcars)
summary(mtcars.lm)

to model mpg on all the remaining variables, without having to write all the variables out explicity. Is there something similar in time series using the fpp3 package?

For example, this returns an error:

library(tidyverse)
library(fpp3)
library(clock)

# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0000000001
All_Employees <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/All_Employees.csv', col_select = c(Label, Value), show_col_types = FALSE)
All_Employees <- All_Employees %>%
  rename(Month = Label, Total_Employees = Value)
All_Employees <- All_Employees %>%
  mutate(Month = yearmonth(Month)) %>% 
  as_tsibble(index = Month) %>% 
  mutate(Total_Employees_Diff = difference(Total_Employees))

index = All_Employees$Month

All_Employees <- All_Employees %>% 
  filter((Month >= start_month), (Month <= end_month))

# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0500000003
Average_Hourly_Earnings <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/Average_Hourly_Earnings.csv', col_select = c(Label, Value), show_col_types = FALSE)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>%
  rename(Month = Label, Avg_Hourly_Earnings = Value)
Average_Hourly_Earnings <- Average_Hourly_Earnings %>% 
  mutate(Month = yearmonth(Month)) %>% 
  as_tsibble(index = Month) %>% 
  mutate(Avg_Hourly_Earnings_Diff = difference(Avg_Hourly_Earnings))

Average_Hourly_Earnings <- Average_Hourly_Earnings %>% 
  filter((Month >= start_month), (Month <= end_month))

Monthly_labor_data_small <- 
  tsibble(
    Month = All_Employees$Month,
    index = Month,
    'Total_Employees' = All_Employees$Total_Employees,
    'Avg_Earnings' = Average_Hourly_Earnings$Avg_Hourly_Earnings
  )

start_month_small = yearmonth("2020 Mar")
end_month_small = yearmonth("2022 Jan")  

Monthly_labor_data_small <- Monthly_labor_data_small %>% 
  filter((Month >= start_month_small), (Month <= end_month_small))


Monthly_labor_data_small %>% 
  model(
  linear = TSLM(Total_Employees ~ .,))

The error is: Error in TSLM(Total_Employees ~ ., ) : unused argument (alist())

But this runs fine if I list everything out:

fit <- Monthly_labor_data_small %>% 
  model(
  linear = TSLM(Total_Employees ~ Avg_Earnings + season() + trend()))

report(fit)

The full tsibble will have a large number of columns, is there a short way to list all of them, similar to what can be done in linear regression?


Solution

  • You should be able to do something like

    resp <- "Total_Employees"
    form <- reformulate(response = resp,
       c(setdiff(names(Monthly_labor_data_small), resp),
        "season()", "trend()"))
    

    And then use form in your model. I haven't tried your examples -- if there are other variables (like a time index) that should not be explicitly included in the model then the second argument to setdiff() should be c(resp, "excluded_var2", "excluded_var3")