Cross-validation in time series is returning errors, but non cross-validation method runs without errors and higher accuracy

I'm working to forecast the number of employees in the United States by month. The data is located at:

library(tidyverse)
library(fpp3)

# Source: https://beta.bls.gov/dataViewer/view/timeseries/CES0000000001
All_Employees <- read_csv('https://raw.githubusercontent.com/InfiniteCuriosity/predicting_labor/main/All_Employees.csv', col_select = c(Label, Value), show_col_types = FALSE)
All_Employees <- All_Employees %>%
  rename(Month = Label, Total_Employees = Value)
All_Employees <- All_Employees %>%
  mutate(Month = yearmonth(Month)) %>% 
  as_tsibble(index = Month)

I'm using the excellent text and this is the page that discusses cross-validation: Forecasting Principles and Practice, 3rd Edition

Here is the code I'm running using cross-validation:

All_Employees_train <- All_Employees %>% 
  stretch_tsibble()

All_Employees_train %>% 
  model(
    linear = TSLM(Total_Employees ~ trend() + season()),
    Exponential = TSLM(log(Total_Employees) ~ trend() + season()),
    Arima = ARIMA(Total_Employees ~ trend() + season()),
    Ets = ETS(Total_Employees),
    Mean = MEAN(Total_Employees),
    Naive = NAIVE(Total_Employees),
    SNaive = SNAIVE(Total_Employees),
    Drift = SNAIVE(Total_Employees ~ drift())) %>%
  forecast(h = 3) %>% 
  accuracy(All_Employees) %>% 
  arrange(RMSE)

That code is returning this result and more than 50 errors, here are the results:

# A tibble: 8 × 10
  .model      .type    ME   RMSE   MAE    MPE  MAPE  MASE RMSSE  ACF1
  <chr>       <chr> <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
1 Naive       Test   162.  2168.  747.  0.101 0.541 0.226 0.487 0.685
2 Ets         Test   214.  4227.  774.  0.145 0.563 0.234 0.949 0.608
3 SNaive      Test   806.  4453. 3303.  0.515 2.36  1     1.00  0.866
4 Drift       Test   535.  4469. 3170.  0.343 2.28  0.960 1.00  0.868
5 Exponential Test  1861.  4692. 3942.  1.27  2.81  1.19  1.05  0.934
6 linear      Test  1887.  4697. 3952.  1.29  2.81  1.20  1.05  0.934
7 Mean        Test  3565.  6724. 5410.  2.37  3.77  1.64  1.51  0.959
8 Arima       Test  -488. 11113. 2290. -0.383 1.65  0.693 2.50  0.673
There were 50 or more warnings (use warnings() to see the first 50)

Here are a few of the 50+ errors:

Warning messages:
1: In for (i in namD) if (is.character(data[[i]])) data[[i]] <- factor(data[[i]]) :
  closing unused connection 12 (<-localhost:11913)

11: Provided exogenous regressors are rank deficient, removing regressors: `season()year2`, `season()year3`, `season()year4`, `season()year5`, `season()year6`, `season()year7`, `season()year8`, `season()year9`, `season()year10`, `season()year11`, `season()year12`

24: In sqrt(diag(best$var.coef)) : NaNs produced

27: 12 errors (2 unique) encountered for Arima

28: 3 errors (2 unique) encountered for Ets
[2] Not enough data to estimate this ETS model.
[1] only 1 case, but 2 variables

50: Problem while computing `Exponential = (function (object, ...) ...`.
ℹ prediction from a rank-deficient fit may be misleading

However, if I simply make a training set and run it against the exact same code, no errors are returned, the best results have a much lower RMSE than cross-validation, and the results are returned much faster than cross-validation (for obvious reasons). Here is the code to make the training set, and the results:

All_Employees_train <- All_Employees %>% 
  filter(Month <= yearmonth("2022 Feb"))

# A tibble: 8 × 10
  .model      .type     ME   RMSE    MAE   MPE  MAPE  MASE RMSSE      ACF1
  <chr>       <chr>  <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>     <dbl>
1 Naive       Test    819.   885.   819. 0.541 0.541 0.252 0.201 -0.000688
2 Ets         Test    825.   891.   825. 0.545 0.545 0.254 0.202 -0.000688
3 Arima       Test   1656.  1861.  1656. 1.09  1.09  0.509 0.422 -0.120   
4 Exponential Test   3075.  3178.  3075. 2.03  2.03  0.946 0.720 -0.151   
5 linear      Test   3172.  3265.  3172. 2.10  2.10  0.976 0.740 -0.143   
6 Drift       Test   5810.  5810.  5810. 3.84  3.84  1.79  1.32  -0.378   
7 SNaive      Test   6521.  6522.  6521. 4.31  4.31  2.01  1.48  -0.378   
8 Mean        Test  11457. 11462. 11457. 7.57  7.57  3.53  2.60  -0.000688

How can the cross-validation method be run without errors (and hopefully better results)?

Solution

Your stretched data set contains very short time series, and fitting models to them is causing these warnings. When you use stretch_tsibble(), set .init to a larger number -- this controls the length of the smallest time series. For example, use at least 2 years of data in each of the training sets:

All_Employees_train <- All_Employees %>% 
  stretch_tsibble(.init = 24)