I have data with mixed frequencies and the date variable is a character vector. I am trying to use case_when()
to parse the dates depending on the assigned frequency. The yq()
and ym()
functions each work fine if I split the dataframe by frequency and don't use the conditional logic. However, I am getting warnings about parsing failures when I use case_when()
on the mixed frequency table. Specifically, the failures are attributed to months 05-12 in monthly data. The parsing functions appear to return the expected result, but I am concerned whether that will always be the case and whether there's a pitfall I'm missing. What is the actual parsing failure and why is it happening?
Reprex:
library(dplyr, quietly = TRUE)
library(lubridate, quietly = TRUE)
library(purrr, quietly = TRUE)
library(stringr, quietly = TRUE)
### Data - construct date columns in the format of my data
qtrs <- paste(":0", 1:4, sep = "")
mnths <- c(paste(":0", 1:9, sep = ""), paste(":", 10:12, sep = ""))
yr_qtrs <- map(1979:1980, ~str_c(.x, qtrs)) %>% list_c()
yr_mnths <- map(1979:1980, ~str_c(.x, mnths)) %>% list_c()
#### Create separate tables and assign frequency variable
qtr_tbl <- tibble(date = yr_qtrs, freq = "quarterly")
mnth_tbl <- tibble(date = yr_mnths, freq = "monthly")
#### Create a mixed frequency tibble
mixed_tbl <- bind_rows(qtr_tbl, mnth_tbl)
### Quarters parse fine with yq()
qtr_tbl %>%
mutate(date_new = yq(date))
#> # A tibble: 8 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 quarterly 1979-01-01
#> 2 1979:02 quarterly 1979-04-01
#> 3 1979:03 quarterly 1979-07-01
#> 4 1979:04 quarterly 1979-10-01
#> 5 1980:01 quarterly 1980-01-01
#> 6 1980:02 quarterly 1980-04-01
#> 7 1980:03 quarterly 1980-07-01
#> 8 1980:04 quarterly 1980-10-01
### Months parse fine with ym()
mnth_tbl %>%
mutate(date_new = ym(date))
#> # A tibble: 24 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 monthly 1979-01-01
#> 2 1979:02 monthly 1979-02-01
#> 3 1979:03 monthly 1979-03-01
#> 4 1979:04 monthly 1979-04-01
#> 5 1979:05 monthly 1979-05-01
#> 6 1979:06 monthly 1979-06-01
#> 7 1979:07 monthly 1979-07-01
#> 8 1979:08 monthly 1979-08-01
#> 9 1979:09 monthly 1979-09-01
#> 10 1979:10 monthly 1979-10-01
#> # ℹ 14 more rows
### Chooses the parsing function based on the freq variable for the mixed freq tbl.
### It gives 16 warnings, which correspond to months 05:12 for each of the two years
mixed_tbl %>%
mutate(date_new = case_when(freq == "monthly" ~ ym(date),
freq == "quarterly" ~ yq(date),
TRUE ~ NA_Date_
)
) %>% print(n = 32)
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `date_new = case_when(...)`.
#> Caused by warning:
#> ! 16 failed to parse.
#> # A tibble: 32 × 3
#> date freq date_new
#> <chr> <chr> <date>
#> 1 1979:01 quarterly 1979-01-01
#> 2 1979:02 quarterly 1979-04-01
#> 3 1979:03 quarterly 1979-07-01
#> 4 1979:04 quarterly 1979-10-01
#> 5 1980:01 quarterly 1980-01-01
#> 6 1980:02 quarterly 1980-04-01
#> 7 1980:03 quarterly 1980-07-01
#> 8 1980:04 quarterly 1980-10-01
#> 9 1979:01 monthly 1979-01-01
#> 10 1979:02 monthly 1979-02-01
#> 11 1979:03 monthly 1979-03-01
#> 12 1979:04 monthly 1979-04-01
#> 13 1979:05 monthly 1979-05-01
#> 14 1979:06 monthly 1979-06-01
#> 15 1979:07 monthly 1979-07-01
#> 16 1979:08 monthly 1979-08-01
#> 17 1979:09 monthly 1979-09-01
#> 18 1979:10 monthly 1979-10-01
#> 19 1979:11 monthly 1979-11-01
#> 20 1979:12 monthly 1979-12-01
#> 21 1980:01 monthly 1980-01-01
#> 22 1980:02 monthly 1980-02-01
#> 23 1980:03 monthly 1980-03-01
#> 24 1980:04 monthly 1980-04-01
#> 25 1980:05 monthly 1980-05-01
#> 26 1980:06 monthly 1980-06-01
#> 27 1980:07 monthly 1980-07-01
#> 28 1980:08 monthly 1980-08-01
#> 29 1980:09 monthly 1980-09-01
#> 30 1980:10 monthly 1980-10-01
#> 31 1980:11 monthly 1980-11-01
#> 32 1980:12 monthly 1980-12-01
Thanks in advance for feedback!
case_when
does not control what will be evaluated nor does it mask input vectors by conditions, meaning that yq(date)
is exactly what it is -- yq()
called with a date
vector. Just to illustrate what's happening here, we could make ym()
and yq()
calls a bit more verbose:
library(dplyr, quietly = TRUE)
library(lubridate, quietly = TRUE)
library(purrr, quietly = TRUE)
library(stringr, quietly = TRUE)
# override ym & yq to get an idea how those are called
ym <- \(dates, ...) {message(">> ym() called with: ", paste(dates, collapse = " ")); lubridate::ym(dates,...)}
yq <- \(dates, ...) {message(">> yq() called with: ", paste(dates, collapse = " ")); lubridate::yq(dates,...)}
# test how it works:
ym(c("2000:04","2000:05"))
#> >> ym() called with: 2000:04 2000:05
#> [1] "2000-04-01" "2000-05-01"
yq(c("2000:04","2000:05"))
#> >> yq() called with: 2000:04 2000:05
#> Warning: 1 failed to parse.
#> [1] "2000-10-01" NA
# now with case_when() to see if input vectors
# for ym() and yq() are different:
mixed_tbl %>%
mutate(date_new = case_when(freq == "monthly" ~ ym(date),
freq == "quarterly" ~ yq(date),
TRUE ~ NA_Date_
)
) %>%
glimpse()
#> >> ym() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> >> yq() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> Warning: There was 1 warning in `mutate()`.
#> ℹ In argument: `date_new = case_when(...)`.
#> Caused by warning:
#> ! 16 failed to parse.
#> Rows: 32
#> Columns: 3
#> $ date <chr> "1979:01", "1979:02", "1979:03", "1979:04", "1980:01", "1980:…
#> $ freq <chr> "quarterly", "quarterly", "quarterly", "quarterly", "quarterl…
#> $ date_new <date> 1979-01-01, 1979-04-01, 1979-07-01, 1979-10-01, 1980-01-01, …
One possible solution, split by freq
, apply apropriate function(s) on each part and bind the resulting list back to a single df:
mixed_tbl %>%
split(~freq) %>%
map_at("monthly", ~ mutate(.x, date_new = ym(date))) %>%
map_at("quarterly", ~ mutate(.x, date_new = yq(date))) %>%
list_rbind() %>%
glimpse()
#> >> ym() called with: 1979:01 1979:02 1979:03 1979:04 1979:05 1979:06 1979:07 1979:08 1979:09 1979:10 1979:11 1979:12 1980:01 1980:02 1980:03 1980:04 1980:05 1980:06 1980:07 1980:08 1980:09 1980:10 1980:11 1980:12
#> >> yq() called with: 1979:01 1979:02 1979:03 1979:04 1980:01 1980:02 1980:03 1980:04
#> Rows: 32
#> Columns: 3
#> $ date <chr> "1979:01", "1979:02", "1979:03", "1979:04", "1979:05", "1979:…
#> $ freq <chr> "monthly", "monthly", "monthly", "monthly", "monthly", "month…
#> $ date_new <date> 1979-01-01, 1979-02-01, 1979-03-01, 1979-04-01, 1979-05-01, …
Input data:
mixed_tbl <- structure(list(
date = c("1979:01", "1979:02", "1979:03", "1979:04",
"1980:01", "1980:02", "1980:03", "1980:04", "1979:01", "1979:02",
"1979:03", "1979:04", "1979:05", "1979:06", "1979:07", "1979:08",
"1979:09", "1979:10", "1979:11", "1979:12", "1980:01", "1980:02",
"1980:03", "1980:04", "1980:05", "1980:06", "1980:07", "1980:08",
"1980:09", "1980:10", "1980:11", "1980:12"), freq = c("quarterly",
"quarterly", "quarterly", "quarterly", "quarterly", "quarterly",
"quarterly", "quarterly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly", "monthly", "monthly", "monthly", "monthly",
"monthly", "monthly")), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -32L))
Created on 2023-05-19 with reprex v2.0.2