I have a dataframe in R called df
:
A = c("ok","WA","WA","ok","WA")
B = c("WB","ok","ok","ok","WB")
C = c("WC","ok","WC","ok","WC")
df = tibble(A,B,C)
df
A B C
<chr> <chr> <chr>
1 ok WB WC
2 WA ok ok
3 WA ok WC
4 ok ok ok
5 WA WB WC
I want to create (mutate) a new column that will concatenate all the values that are not OK like this:
A B C D
<chr> <chr> <chr> <chr>
1 ok WB WC WB,WC
2 WA ok ok WA
3 WA ok WC WA,WC
4 ok ok ok NO W
5 WA WB WC WA,WB,WC
You could use a paste
approach with the help of gsub
:
A <- c("ok","WA","WA","ok","WA")
B <- c("WB","ok","ok","ok","WB")
C <- c("WC","ok","WC","ok","WC")
df <- data.frame(A=A, B=B, C=C, stringsAsFactors=FALSE)
df$D <- paste(df$A, df$B, df$C, sep=",")
df$D <- gsub("^,|,$", "", gsub(",{2,}", ",", gsub("\\bok\\b", ",", df$D)))
df
A B C D
1 ok WB WC WB,WC
2 WA ok ok WA
3 WA ok WC WA,WC
4 ok ok ok
5 WA WB WC WA,WB,WC
The basic strategy here is to strip out ok
entries, then cleanup the possible dangling commas left from the paste
call.