I believe what Id like to do is relatively simple, I just don't seem to know the proper terminology to get the answer to my question. I have a data frame with 9 variables. I want to create a new variable that is based on variables from another column. Simple example:
my.df <- data.frame(col1 = sample(c(1,2), 10, replace = TRUE),
col2 = as.factor(sample(10)), col3 = letters[1:10],
col4 = sample(c(TRUE, FALSE), 10, replace = TRUE))
col1 col2 col3 col4
1 2 8 a TRUE
2 1 3 b FALSE
3 2 4 c FALSE
4 2 2 d TRUE
5 2 7 e FALSE
6 2 9 f TRUE
7 2 10 g FALSE
8 2 6 h FALSE
9 1 1 i FALSE
10 2 5 j FALSE
I would like to create col5 by using information from col3. I am expecting something like this:
my.df<-my.df %>%
mutate(col5 = case_when(col3 = c("a", "b", "c") ~"green",
col3 = c("g", "h", "i", "j")~"red",
col3 = c("d", "e", "f")~"purple"))
I am expecting results like this:
col1 col2 col3 col4 col5
1 2 8 a TRUE green
2 1 3 b FALSE green
3 2 4 c FALSE green
4 2 2 d TRUE purple
5 2 7 e FALSE purple
6 2 9 f TRUE purple
7 2 10 g FALSE red
8 2 6 h FALSE red
9 1 1 i FALSE red
10 2 5 j FALSE red
Error is must be a logical vector, not a character vector. If I change (col3 == c("")... using the == I get warning messages and problem that longer object length is not a multiple of shorter object length.
My solution eventually was to create a vector of just the names and then use %in%. However, I really think there should be a simple way to do this? OR maybe using different commands where I don't have to change values row by row.
Example of what I did get to work, which I had to do for each color:
grn<-c("a", "b", "c")
my.df<-my.df %>%
mutate(col5 = case_when(col3 %in% grn~"green")
You can use %in%
to compare multiple values -
library(dplyr)
my.df %>%
mutate(col5 = case_when(col3 %in% c("a", "b", "c") ~"green",
col3 %in% c("g", "h", "i", "j")~"red",
col3 %in% c("d", "e", "f")~"purple"))