Search code examples
rdplyracross

Creating factors fails inside dplyr mutate(across) call


Creating factor levels in a dataset with NAs works for individual columns, but I need to iterate across many more columns (all start with 'impact.') and have struck a problem inside a dplyr mutate(across)

What am I doing wrong?

Reprex below

library(tribble)
library(dplyr)

df <- tribble(~id, ~tumour, ~impact.chemo, ~impact.radio,
        1,'lung',NA,1,
        2,'lung',1,NA,
        3,'lung',2,3,
        4,'meso',3,4,
        5,'lung',4,5)

# Factor labels
trt_labels <- c('Planned', 'Modified', 'Interrupted', 'Deferred', "Omitted")

# Such that factor levels match labels as, retaining NAs where present:
data.frame(level = 1:5,
           label = trt_labels)

# Create factor works for individual columns
factor(df$impact.chemo, levels = 1:5, labels = trt_labels)
factor(df$impact.radio, levels = 1:5, labels = trt_labels)

# But fails inside mutate(across)
df %>% 
  mutate(across(.cols = starts_with('impact'), ~factor(levels = 1:5, labels = trt_labels)))

Solution

  • Just making @27ϕ9's comment an answer: the purrr-style lambda function you specified inside across is not correct because it needs the first argument, which is the object the function should refer to (in this case, the dataframe columns selected by across).

    To fix your issue, you should insert .x inside the lambda function, which is non other than a shortcut for function(x) x - see this page for more info about purrr-style lambda functions.

    df %>% 
      mutate(across(.cols = starts_with('impact'), ~factor(.x, levels = 1:5, labels = trt_labels)))
    
    # A tibble: 5 x 4
    #      id tumour impact.chemo impact.radio
    #   <dbl> <chr>  <fct>        <fct>       
    # 1     1 lung   NA           Planned     
    # 2     2 lung   Planned      NA          
    # 3     3 lung   Modified     Interrupted 
    # 4     4 meso   Interrupted  Deferred    
    # 5     5 lung   Deferred     Omitted