Search code examples
rdplyrcasemutate

Trying to sort character variable into new variable with new value based on conditions


I want to sort a character variable into two categories in a new variable based on conditions, in conditions are not met i want it to return "other".

If variable x cointains 4 character values "A", "B", "C" & "D" I want to sort them into a 2 categories, 1 and 0, in a new variable y, creating a dummy variable

Ideally I want it to look like this

df <- data.frame(x = c("A", "B", "C" & "D")

 y <- if x == "A" | "D" then assign 1 in y
 if x == "B" | "C" then assign 0 in y
 if x == other then assign NA in y

    x   y
  1 "A"  1
  2 "B"  0
  3 "C"  0
  4 "D"  1



 library(dplyr)
 df <- df %>% mutate ( y =case_when(
  (x %in% df == "A" | "D") ~ 1 , 
  (x %in% df == "B" | "C") ~ 1,
   x %in% df ==  ~ NA
 ))

I got this error message

Error: replacement has 3 rows, data has 2

Solution

  • Here's the proper case_when syntax.

    df <- data.frame(x = c("A", "B", "C", "D"))
     
    library(dplyr)
    
    df <- df %>%
      mutate(y = case_when(x %in% c("A", "D") ~ 1,
                           x %in% c("B", "C") ~ 0,
                           TRUE ~ NA_real_))
    df
    #>   x y
    #> 1 A 1
    #> 2 B 0
    #> 3 C 0
    #> 4 D 1