Search code examples
rdplyracross

Correct syntax for mutate across when excluding columns


I'm trying to convert code from using mutate_at to using mutate(across()). I'm assuming I have a syntax error but after 45min of trying to figure it out I decided it was time to engage this forum.

Reproducible example:

library(tidyverse)

df_initial <- structure(list(Date = structure(c(18684, 18685, 18686, 18684, 
                                                18685, 18686, 18684, 18685, 18686, 18684, 18685, 18686, 18684, 
                                                18685, 18686, 18684, 18685, 18686, 18684, 18685, 18686, 18684, 
                                                18685, 18686, 18684, 18685, 18686, 18684, 18685, 18686), class = "Date"), 
                             index_name = c("INDU Index", "INDU Index", "INDU Index", 
                                            "SPX Index", "SPX Index", "SPX Index", "MID Index", "MID Index", 
                                            "MID Index", "SML Index", "SML Index", "SML Index", "CCMP Index", 
                                            "CCMP Index", "CCMP Index", "RTY Index", "RTY Index", "RTY Index", 
                                            "S5INFT Index", "S5INFT Index", "S5INFT Index", "S5FINL Index", 
                                            "S5FINL Index", "S5FINL Index", "S5HLTH Index", "S5HLTH Index", 
                                            "S5HLTH Index", "S5CONS Index", "S5CONS Index", "S5CONS Index"
                             ), index_level = c(30932.37, NA, NA, 3811.15, NA, NA, 2496.26, 
                                                NA, NA, 1278.56, NA, NA, 13192.35, NA, NA, 2201.051, NA, 
                                                NA, 2293.4, NA, NA, 535.64, NA, NA, 1311.27, NA, NA, 649.39, 
                                                NA, NA), totalReturn_daily = c(-1.4628, 0, 0, -0.4636, 0, 
                                                                               0, -0.0888, 0, 0, -0.3891, 0, 0, 0.5587, 0, 0, 0.0507, 0, 
                                                                               0, 0.5991, 0, 0, -1.9617, 0, 0, -0.8079, 0, 0, -1.6277, 0, 
                                                                               0)), row.names = c(NA, -30L), groups = structure(list(index_name = c("CCMP Index", 
                                                                                                                                                    "INDU Index", "MID Index", "RTY Index", "S5CONS Index", "S5FINL Index", 
                                                                                                                                                    "S5HLTH Index", "S5INFT Index", "SML Index", "SPX Index"), .rows = structure(list(
                                                                                                                                                      13:15, 1:3, 7:9, 16:18, 28:30, 22:24, 25:27, 19:21, 10:12, 
                                                                                                                                                      4:6), ptype = integer(0), class = c("vctrs_list_of", "vctrs_vctr", 
                                                                                                                                                                                          "list"))), row.names = c(NA, 10L), class = c("tbl_df", "tbl", 
                                                                                                                                                                                                                                       "data.frame"), .drop = TRUE), class = c("grouped_df", "tbl_df", 
                                                                                                                                                                                                                                                                               "tbl", "data.frame"))

head(df_initial)


# A tibble: 6 x 4
# Groups:   index_name [2]
  Date       index_name index_level totalReturn_daily
  <date>     <chr>            <dbl>             <dbl>
1 2021-02-26 INDU Index      30932.            -1.46 
2 2021-02-27 INDU Index         NA              0    
3 2021-02-28 INDU Index         NA              0    
4 2021-02-26 SPX Index        3811.            -0.464
5 2021-02-27 SPX Index          NA              0    
6 2021-02-28 SPX Index          NA              0 

My previous code with mutate_at works fine:

df1 <- df_initial %>%
  mutate_at(vars(-index_name, -totalReturn_daily),
            ~ na.locf(., na.rm = FALSE)
  )

head(df1)
# A tibble: 6 x 4
# Groups:   index_name [2]
  Date       index_name index_level totalReturn_daily
  <date>     <chr>            <dbl>             <dbl>
1 2021-02-26 INDU Index      30932.            -1.46 
2 2021-02-27 INDU Index      30932.             0    
3 2021-02-28 INDU Index      30932.             0    
4 2021-02-26 SPX Index        3811.            -0.464
5 2021-02-27 SPX Index        3811.             0    
6 2021-02-28 SPX Index        3811.             0  

When I try to convert to mutate and across using this, I get an error:

df2 <- df_initial %>%
  mutate(across(.cols = -c(index_name, totalReturn_daily),
                .fns  = ~ na.locf(., na.rm = FALSE)
  )
  )

Error: Can't subset elements that don't exist.
x Location 56 doesn't exist.
i There are only 10 elements.
Run `rlang::last_error()` to see where the error occurred.

I suspect the error stems from trying to exclude the columns in the .cols = line, but I tried .cols = !c(index_name, totalReturn_daily) and .cols = c(-index_name, -totalReturn_daily) and even .cols != c(index_name, totalReturn_daily) but I get the same error.

I appreciate the help!


Solution

  • Your data is grouped by index_name, across does not find columns that are grouped. Try to ungroup the data first :

    library(dplyr)
    df_initial %>%
      ungroup %>%
      mutate(across(.cols = -c(index_name, totalReturn_daily),
                    .fns  = ~ na.locf(., na.rm = FALSE)))