I thought I had the answer to my question here, but when I used with my larger data set I get different results. I suspect the difference is because of the way the na.locf
line is acting.
Basically I am converting code where I used mutate_at
to the new syntax with mutate(across())
.
In the first case below, the data is filled correctly, because df_initial
is still grouped by the index_name. In the second case, I'm assuming because I had to ungroup for the mutate
across
to work, I get a different answer.
So here is another example with a larger data set to illustrate the problem.
Reproducible example:
df_initial <-
structure(list(Date = structure(c(18681, 18681, 18681, 18681,
18682, 18682, 18682, 18682, 18683, 18683, 18683, 18683, 18684,
18684, 18684, 18684, 18685, 18685, 18685, 18685, 18686, 18686,
18686, 18686), class = "Date"), index_name = c("INDU Index",
"SPX Index", "TPX Index", "MEXBOL Index", "INDU Index", "SPX Index",
"TPX Index", "MEXBOL Index", "INDU Index", "SPX Index", "TPX Index",
"MEXBOL Index", "INDU Index", "SPX Index", "TPX Index", "MEXBOL Index",
"INDU Index", "SPX Index", "TPX Index", "MEXBOL Index", "INDU Index",
"SPX Index", "TPX Index", "MEXBOL Index"), index_level = c(31537.35,
3881.37, NA, 45268.33, 31961.86, 3925.43, 1903.07, 45151.38,
31402.01, 3829.34, 1926.23, 44310.27, 30932.37, 3811.15, 1864.49,
44592.91, NA, NA, NA, NA, NA, NA, NA, NA), totalReturn_daily = c(0.0497,
0.1277, 0, 0.7158, 1.3461, 1.1364, -1.8201, -0.1151, -1.7181,
-2.4339, 1.2411, -1.8629, -1.4628, -0.4636, -3.2052, 0.6379,
0, 0, 0, 0, 0, 0, 0, 0)), row.names = c(NA, -24L), groups = structure(list(
index_name = c("INDU Index", "MEXBOL Index", "SPX Index",
"TPX Index"), .rows = structure(list(c(1L, 5L, 9L, 13L, 17L,
21L), c(4L, 8L, 12L, 16L, 20L, 24L), c(2L, 6L, 10L, 14L,
18L, 22L), c(3L, 7L, 11L, 15L, 19L, 23L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
The first approach below gives the correct values, but the second approach below does not. So I am trying to get the same answer in approach #2 and I get in approach #1.
# Approach 1: Expected output received here:
df_initial %>%
mutate_at(vars(-index_name, -totalReturn_daily),
~ na.locf(., na.rm = FALSE)) %>%
filter(index_name == "TPX Index")
# Output
Date index_name index_level totalReturn_daily
<date> <chr> <dbl> <dbl>
1 2021-02-23 TPX Index NA 0
2 2021-02-24 TPX Index 1903. -1.82
3 2021-02-25 TPX Index 1926. 1.24
4 2021-02-26 TPX Index 1864. -3.21
5 2021-02-27 TPX Index 1864. 0
6 2021-02-28 TPX Index 1864. 0
# Approach 2: Did not receive expected output here
df_initial %>%
ungroup() %>%
mutate(across(
.cols = -c(index_name, totalReturn_daily),
.fns = ~ na.locf(., na.rm = FALSE)
)) %>%
filter(index_name == "TPX Index")
# Output
Date index_name index_level totalReturn_daily
<date> <chr> <dbl> <dbl>
1 2021-02-23 TPX Index 3881. 0
2 2021-02-24 TPX Index 1903. -1.82
3 2021-02-25 TPX Index 1926. 1.24
4 2021-02-26 TPX Index 1864. -3.21
5 2021-02-27 TPX Index 44593. 0
6 2021-02-28 TPX Index 44593. 0
Thanks!
Both approaches gave similar results for me. Could you try the code below?
library(zoo)
df_initial %>%
group_by(index_name) %>%
mutate_at(vars(-index_name, -totalReturn_daily),
~ na.locf(., na.rm = FALSE)) %>%
dplyr::filter(index_name == "TPX Index")
df_initial %>%
group_by(index_name) %>%
mutate(across(
.cols = -c(totalReturn_daily),
.fns = ~ na.locf(., na.rm = FALSE)
)) %>%
ungroup() %>%
dplyr::filter(index_name == "TPX Index")