Search code examples
rtidyversedplyracross

Using mutate with multiple functions


I am trying to provide two functions inside the mutate(across(where(is.factor))) to order the factor levels and drop unused levels. The code appears not to be working as expected. Where might have gone wrong?

#---- Libraries ----

library(tidyverse)

#---- Data ----

set.seed(2021)

df <- tibble(
  a1 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  a2 = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  gender = gl(2, 15, labels = c("Males", "Females")),
  b2 = gl(3, 10, labels = c("Primary", "Secondary", "Tertiary", "Unknown")),
  c1 = gl(3, 10, labels = c("15-19", "20-24", "25-30", "30-35")),
  outcome = factor(ifelse(sign(rnorm(30))==-1, 0, 1), labels = c("No", "Yes")),
  weight = runif(30, 1, 12)
)

#---- Problem ----

df <- df %>%
  mutate(across(where(is.factor), list(fct_infreq, fct_drop)))

levels(df$b2)

# The unused levels not dropped


Solution

  • The issue is that you are actually mutating two new columns here, so you will see in your resulting dataframe that there are two columns b2_1 and b2_2, each corresponding to applying the two functions.

    If you run levels(df$b2_2) you'll see your desired output.

    If your aim is to first drop and then reorder then you need to run consecutive mutates:

    df <- df %>%
      mutate(across(where(is.factor), fct_drop)) %>% 
      mutate(across(where(is.factor), fct_infreq)) 
      
    

    or run nested functions in your mutate

    df <- df %>%
      mutate(across(where(is.factor), ~fct_infreq(fct_drop(.x))))