Search code examples
rrecodedplyr

Recode multiple columns using dplyr


I had a dataframe where I recoded several columns so that 999 was set to NA

dfB <-dfA %>%
  mutate(adhere = if_else(adhere==999, as.numeric(NA), adhere)) %>%
  mutate(engage = if_else(engage==999, as.numeric(NA), engage)) %>%
  mutate(quality = if_else(quality==999, as.numeric(NA), quality)) %>%
  mutate(undrstnd = if_else(undrstnd==999, as.numeric(NA), undrstnd)) %>%
  mutate(sesspart = if_else(sesspart==999, as.numeric(NA), sesspart)) %>%
  mutate(attended = if_else(attended>=9, as.integer(NA), attended))

I want to use mutate_at() and a range of columns and recode() instead of if_else(), but I am stuck on how to give it the condition. I think something like 999 = NA based on some mutate_all examples -- but I also need the NA to match the type of .x and I am unsure how to get it to be type sensitive

I tried:

y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
z <- y %>%
    mutate_at( vars(y1:y2), funs(recode(.,`999` = as.numeric(NA))))

But I get a warning "Unreplaced values treated as NA as .x is not compatible. Please specify replacements exhaustively or supply .default " and I can see that it worded for the numeric column, but not for the integer column y2"

> z
  y1 y2    y3
1  1 NA  TRUE
2  2 NA  TRUE
3 NA NA FALSE
4  3 NA FALSE
5  4 NA  TRUE

Solution

  • Currently, based on dplyr documentation:

    across() supersedes the family of "scoped variants" like summarise_at(), summarise_if(), and summarise_all().

    So, using mutate and across instead is now recommended.

    Like Chris LeBoa said, if you only want to convert an annoying value to NA, the function na_if() is probably the best choice:

    y <- data.frame(y1=c(1,2,999,3,4), y2=c(1L, 2L, 999L, 3L, 4L), y3=c(T,T,F,F,T))
    
    y
       y1  y2    y3
    1   1   1  TRUE
    2   2   2  TRUE
    3 999 999 FALSE
    4   3   3 FALSE
    5   4   4  TRUE
     
    z <- y %>%
        mutate(across(
            y1:y2,
            ~na_if(., 999)
        ))
    
    z
      y1 y2    y3
    1  1  1  TRUE
    2  2  2  TRUE
    3 NA NA FALSE
    4  3  3 FALSE
    5  4  4  TRUE
    

    Similarly, if you really want to recode values in multiple columns, you can follow the example from bcarothers:

    df1 <- tibble(Q7_1=1:5,
                  Q7_1_TEXT=c("let's","see","grogu","this","week"),
                  Q8_1=6:10,
                  Q8_1_TEXT=rep("grogu",5),
                  Q8_2=11:15,
                  Q8_2_TEXT=c("grogu","is","the","absolute","best"))
    
    df2 <- df1 %>%
        mutate(across(
            starts_with("Q8") & ends_with("TEXT"),
            ~recode(., "grogu"="mando")
        ))