I'm trying to use case_when()
and mutate()
for data wrangling, but I'm running into difficulty: I want to add a new column (new_var
) such that if var_1
contains "A"
then new_var
will contain "A"
. But if var_1
contains "B"
or (real) NAs (indeed anything other than "A"
) then new_var
will become whatever value is found in var_2
.
df <- tibble(var_1=c('A','B','A',NA,'B','B',NA,'B'),
var_2=c('MM','QQ','ZZ','GG','MM','QQ','ZZ','GG'))
A tibble: 8 × 2
var_1 var_2
<chr> <chr>
1 A MM
2 B QQ
3 A ZZ
4 NA GG
5 B MM
6 B QQ
7 NA ZZ
8 B GG
My desired outcome is this:
# A tibble: 8 × 3
var_1 var_2 new_var
<chr> <chr> <chr>
1 A MM A
2 B QQ QQ
3 A ZZ A
4 NA GG GG
5 B MM MM
6 B QQ QQ
7 NA ZZ ZZ
8 B GG GG
I've reviewed numerous questions involving case_when()
and mutate()
, but none address how to achieve new_var
becoming whatever value is found in var_2
. This is my first attempt to use case_when()
.
I've got the general approach to creating the new column, but then I'm stuck:
library(dplyr)
df %>%
mutate(new_var = case_when(var_1 == 'A' ~ 'A',
is.na(var_1) ~ var_2, ##obv ~var_2 doesn't work
var_1 == 'B' ~ var_2),
TRUE = 'Unknown')
Because I hope to include this in a longer pipe, I'd prefer a solution using dplyr. Thanks in advance!
You have syntax errors above, including unbalanced parentheses, and also the final TRUE
condition in case_when()
should also use tilde. Making all these changes, it seems to be working:
df <- df %>%
mutate(new_var = case_when(
var_1 == "A" ~ "A",
is.na(var_1) ~ var_2,
var_1 == "B" ~ var_2,
TRUE ~ "Unknown")
)
df
# A tibble: 8 x 3
var_1 var_2 new_var
<chr> <chr> <chr>
1 A MM A
2 B QQ QQ
3 A ZZ A
4 <NA> GG GG
5 B MM MM
6 B QQ QQ
7 <NA> ZZ ZZ
8 B GG GG