I have a dataframe as shown below and the output from this code is shown below.
Question: Since in the original dataframe, there can be multiple columns of value format type as in col1 and col2, how the code below can be modified to get the desired output?
#STEP1
df <- data.frame(
col1 = c("abc_1_102", "abc_1_103", "xyz_1_104"),
col2 = c("107", "108", "106")
)
#STEP2
split_text <- strsplit(df$col1, "_")
third_elements <- sapply(split_text, function(x) if(length(x) >= 3) x[3] else NA)
#STEP3
df$col3<-third_elements
#STEP4
selection<-c(107,102,108)
df$col4<-ifelse(df$col2 %in% selection,"SELECT","NOTSELECT")
df$col5<-ifelse(df$col3 %in% selection,"SELECT","NOTSELECT")
#STEP5
df$col6<-paste(df$col4,df$col5,sep = ",")
Output from above code:
col1 col2 col3 col4 col5 col6
1 abc_1_102 107 102 SELECT SELECT SELECT,SELECT
2 abc_1_103 108 103 SELECT NOTSELECT SELECT,NOTSELECT
3 xyz_1_104 106 104 NOTSELECT NOTSELECT NOTSELECT,NOTSELECT
Desired output
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 NOTSELECT,SELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT
You can do this all in one go with by pasting two ifelse
statements together. The ifelse
for col2
is straightforward. The ifelse
for col3
uses grepl
to search for any of the numbers in select
by creating a search string using paste(..., collapse = "|")
(pasting the "or" operator between them). The outer paste(..., sep = ",")
puts it all together:
df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
ifelse(grepl(paste(selection, collapse = "|"), df$col1), "SELECT", "NOTSELECT"),
sep = ",")
Or to more safely add a word boundary to the second ifelse
(thanks to @r2evans!)
df$col6 <- paste(ifelse(df$col2 %in% selection, "SELECT", "NOTSELECT"),
ifelse(grepl(
paste0("(^|_)(", paste(selection, collapse = "|"), ")(_|$)\\b"),
df$col1),
"SELECT", "NOTSELECT"),
sep = ",")
Both give the same output in this example:
col1 col2 col6
1 abc_1_102 107 SELECT,SELECT
2 abc_1_103 108 SELECT,NOTSELECT
3 xyz_1_104 106 NOTSELECT,NOTSELECT