I have a vector with multiple strings
strings <- c("CD4","CD8A")
and I'd like to output an OR statement to be passed to grep like so
"CD4-|-CD4-|-CD4$|CD8A-|-CD8A-|-CD8A$"
and so on for each element in the vector..
basically I'm trying to find an exact word in a string that has three dashes in it, (I don't want grep(CD4, ..)
to return strings with CD40). This is how I thought of doing it but I'm open to other suggestions
part of my data.frame looks like this:
Genes <- as.data.frame(c("CD4-MyD88-IL27RA", "IL2RG-CD4-GHR","MyD88-CD8B-EPOR", "CD8A-IL3RA-CSF3R", "ICOS-CD40-LMP1"))
colnames(Genes) <- "Genes"
Here is a one-liner...
Genes$Genes[grep(paste0("\\b",strings,"\\b",collapse="|"),Genes$Genes)]
[1] "CD4-MyD88-IL27RA" "IL2RG-CD4-GHR" "CD8A-IL3RA-CSF3R"
It uses word-boundary markers \\b
to make sure that it matches complete substrings (as the -
does not count as part of a word).