Search code examples
rregexrep

collapse strings in a vector three times for an or statement in r


I have a vector with multiple strings

strings <- c("CD4","CD8A")

and I'd like to output an OR statement to be passed to grep like so

"CD4-|-CD4-|-CD4$|CD8A-|-CD8A-|-CD8A$"

and so on for each element in the vector..

basically I'm trying to find an exact word in a string that has three dashes in it, (I don't want grep(CD4, ..) to return strings with CD40). This is how I thought of doing it but I'm open to other suggestions

part of my data.frame looks like this:

Genes <- as.data.frame(c("CD4-MyD88-IL27RA", "IL2RG-CD4-GHR","MyD88-CD8B-EPOR", "CD8A-IL3RA-CSF3R", "ICOS-CD40-LMP1"))
colnames(Genes) <- "Genes"

Solution

  • Here is a one-liner...

    Genes$Genes[grep(paste0("\\b",strings,"\\b",collapse="|"),Genes$Genes)]
    
    [1] "CD4-MyD88-IL27RA" "IL2RG-CD4-GHR"    "CD8A-IL3RA-CSF3R"
    

    It uses word-boundary markers \\b to make sure that it matches complete substrings (as the - does not count as part of a word).