Search code examples
rdplyr

Difficulty using case_when() to add column that, conditionally, pastes value from another column


I'm trying to use case_when() and mutate() for data wrangling, but I'm running into difficulty: I want to add a new column (new_var) such that if var_1 contains "A" then new_var will contain "A". But if var_1 contains "B" or (real) NAs (indeed anything other than "A") then new_var will become whatever value is found in var_2.

df <- tibble(var_1=c('A','B','A',NA,'B','B',NA,'B'), 
var_2=c('MM','QQ','ZZ','GG','MM','QQ','ZZ','GG'))

 A tibble: 8 × 2
  var_1 var_2
  <chr> <chr>
1 A     MM   
2 B     QQ   
3 A     ZZ   
4 NA    GG   
5 B     MM   
6 B     QQ   
7 NA    ZZ   
8 B     GG   

My desired outcome is this:

# A tibble: 8 × 3
  var_1 var_2 new_var
  <chr> <chr> <chr>  
1 A     MM    A      
2 B     QQ    QQ     
3 A     ZZ    A      
4 NA    GG    GG     
5 B     MM    MM     
6 B     QQ    QQ     
7 NA    ZZ    ZZ     
8 B     GG    GG  

I've reviewed numerous questions involving case_when() and mutate(), but none address how to achieve new_var becoming whatever value is found in var_2. This is my first attempt to use case_when().

I've got the general approach to creating the new column, but then I'm stuck:

library(dplyr)

df %>%
  mutate(new_var = case_when(var_1 == 'A' ~ 'A',
         is.na(var_1) ~ var_2,  ##obv ~var_2 doesn't work
         var_1 == 'B' ~ var_2),
         TRUE = 'Unknown')

Because I hope to include this in a longer pipe, I'd prefer a solution using dplyr. Thanks in advance!


Solution

  • You have syntax errors above, including unbalanced parentheses, and also the final TRUE condition in case_when() should also use tilde. Making all these changes, it seems to be working:

    df <- df %>%
        mutate(new_var = case_when(
            var_1 == "A" ~ "A",
            is.na(var_1) ~ var_2,
            var_1 == "B" ~ var_2,
            TRUE ~ "Unknown")
        )
    
    df
    
    # A tibble: 8 x 3
      var_1 var_2 new_var
      <chr> <chr> <chr>
    1 A     MM    A
    2 B     QQ    QQ
    3 A     ZZ    A
    4 <NA>  GG    GG
    5 B     MM    MM
    6 B     QQ    QQ
    7 <NA>  ZZ    ZZ
    8 B     GG    GG