I want to sort a character variable into two categories in a new variable based on conditions, in conditions are not met i want it to return "other".
If variable x cointains 4 character values "A", "B", "C" & "D" I want to sort them into a 2 categories, 1 and 0, in a new variable y, creating a dummy variable
Ideally I want it to look like this
df <- data.frame(x = c("A", "B", "C" & "D")
y <- if x == "A" | "D" then assign 1 in y
if x == "B" | "C" then assign 0 in y
if x == other then assign NA in y
x y
1 "A" 1
2 "B" 0
3 "C" 0
4 "D" 1
library(dplyr)
df <- df %>% mutate ( y =case_when(
(x %in% df == "A" | "D") ~ 1 ,
(x %in% df == "B" | "C") ~ 1,
x %in% df == ~ NA
))
I got this error message
Error: replacement has 3 rows, data has 2
Here's the proper case_when
syntax.
df <- data.frame(x = c("A", "B", "C", "D"))
library(dplyr)
df <- df %>%
mutate(y = case_when(x %in% c("A", "D") ~ 1,
x %in% c("B", "C") ~ 0,
TRUE ~ NA_real_))
df
#> x y
#> 1 A 1
#> 2 B 0
#> 3 C 0
#> 4 D 1