I am trying to determine whether drug names which are spread across multiple columns, are present within a string. I am using str_detect, and eventually need to refer to column indices as I don't know how many columns I need to use to search.
Example data:
string.to.search<-c("abc","bc","cd","e","f")
Drug1<-c("b","c","e","f",NA)
Drug2<-c("c","d",NA,NA,NA)
Drug3<-c(NA,NA,NA,NA,NA)
df<-as.data.frame(cbind(string.to.search, Drug1, Drug2, Drug3))
I can do this easily enough when referring to the columns I wish to use as search terms (Drug1 to Drug3 in this instance), and achieve the desired result:
df.new<-df%>%
mutate(Found = str_detect(string.to.search, paste(Drug1, Drug2, Drug3, sep="|")))
String.to.search | Drug1 | Drug2 | Drug3 | Found |
---|---|---|---|---|
a b c | b | c | TRUE | |
b c | c | d | TRUE | |
c d | e | FALSE | ||
e | f | FALSE | ||
f | FALSE |
I have found it is possible to use a column index, but can only make this work for a single column:
df.new2<-df%>%
mutate(Found = str_detect(string.to.search, .[[2]]))
String.to.search | Drug1 | Drug2 | Drug3 | Found |
---|---|---|---|---|
a b c | b | c | TRUE | |
b c | c | d | TRUE | |
c d | e | FALSE | ||
e | f | FALSE | ||
f |
I can't figure out how to make this work across multiple column indices so that I can go from the second to the last column (I might have 1 drug or I might have 10 in future). When I try this I get the following error:
df.new3<-df%>%
mutate(Found = str_detect(string.to.search, .[[2:ncol(df)]]))
Error: Problem with mutate()
column Found
.
i Found = str_detect(string.to.search, .[[2:ncol(df)]])
.
x recursive indexing failed at level 2
When I just look at what is returned by df[2:ncol(df)]
, only the three Drug columns are returned as expected, so I feel like I must be using the correct indices.
Any help greatly appreciated.
If your string.to.search
is always space-delimited (or something easily split), then we can use %in%
instead of string-matches; the latter can provide false-positives if you have sub-string matches.
df <- structure(list(string.to.search = c("a b c", "b c", "c d", "e", "f"), Drug1 = c("b", "c", "e", "f", NA), Drug2 = c("c", "d", NA, NA, NA), Drug3 = c(NA_character_, NA_character_, NA_character_, NA_character_, NA_character_)), row.names = c(NA, -5L), class = "data.frame")
df
# string.to.search Drug1 Drug2 Drug3
# 1 a b c b c <NA>
# 2 b c c d <NA>
# 3 c d e <NA> <NA>
# 4 e f <NA> <NA>
# 5 f <NA> <NA> <NA>
df %>%
mutate(
Found = Reduce(`|`,
lapply(subset(., select=Drug1:Drug3),
function(z) mapply(`%in%`, z, strsplit(string.to.search, " +"))))
)
# string.to.search Drug1 Drug2 Drug3 Found
# 1 a b c b c <NA> TRUE
# 2 b c c d <NA> TRUE
# 3 c d e <NA> <NA> FALSE
# 4 e f <NA> <NA> FALSE
# 5 f <NA> <NA> <NA> FALSE