Search code examples
rlistselectioncase-when

How to use or should I use case_when to change values when using the list of variables


I believe what Id like to do is relatively simple, I just don't seem to know the proper terminology to get the answer to my question. I have a data frame with 9 variables. I want to create a new variable that is based on variables from another column. Simple example:

my.df <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
        col2 = as.factor(sample(10)), col3 = letters[1:10],
        col4 = sample(c(TRUE, FALSE), 10, replace = TRUE))


    col1 col2 col3  col4
1     2    8    a  TRUE
2     1    3    b FALSE
3     2    4    c FALSE
4     2    2    d  TRUE
5     2    7    e FALSE
6     2    9    f  TRUE
7     2   10    g FALSE
8     2    6    h FALSE
9     1    1    i FALSE
10    2    5    j FALSE

I would like to create col5 by using information from col3. I am expecting something like this:

my.df<-my.df %>%
  mutate(col5 = case_when(col3 = c("a", "b", "c") ~"green",
                          col3 = c("g", "h", "i", "j")~"red",
                          col3 = c("d", "e", "f")~"purple"))

I am expecting results like this:

 col1 col2 col3  col4    col5
1     2    8    a  TRUE  green
2     1    3    b FALSE  green
3     2    4    c FALSE  green
4     2    2    d  TRUE  purple
5     2    7    e FALSE  purple
6     2    9    f  TRUE  purple
7     2   10    g FALSE  red
8     2    6    h FALSE  red
9     1    1    i FALSE  red
10    2    5    j FALSE  red

Error is must be a logical vector, not a character vector. If I change (col3 == c("")... using the == I get warning messages and problem that longer object length is not a multiple of shorter object length.

My solution eventually was to create a vector of just the names and then use %in%. However, I really think there should be a simple way to do this? OR maybe using different commands where I don't have to change values row by row.

Example of what I did get to work, which I had to do for each color:

grn<-c("a", "b", "c")
my.df<-my.df %>%
      mutate(col5 = case_when(col3 %in% grn~"green")

Solution

  • You can use %in% to compare multiple values -

    library(dplyr)
    
    my.df %>%
      mutate(col5 = case_when(col3 %in% c("a", "b", "c") ~"green",
                              col3 %in% c("g", "h", "i", "j")~"red",
                              col3 %in% c("d", "e", "f")~"purple"))