Search code examples
rstringdataframesubstring

Remove data frame rows if keys contain a case-insensitve substring


I have a dataset like the following:

FirstName Letter   
Alexsmith     A1
ThegreatAlex      A6
AlexBobJones1      A7
Bobsmiles222       A1
Christopher     A9
Christofer     A6

I want to remove all rows that contain, for example "Alex" (or "alex", "aLex" etc.), anywhere in the FirstName value. I have tried using grep("Alex") but have stumbled on combining my dplyr with base R, and grep seems to want a vector not a data table.

Thanks! Happy to clarify any questions.


Solution

  • dat <- structure(list(FirstName = c("Alexsmith", "ThegreatAlex", "AlexBobJones1", 
    "Bobsmiles222", "Christopher", "Christofer"), Letter = c("A1", 
    "A6", "A7", "A1", "A9", "A6")), class = "data.frame", row.names = c(NA, -6L))
    #      FirstName Letter
    #1     Alexsmith     A1
    #2  ThegreatAlex     A6
    #3 AlexBobJones1     A7
    #4  Bobsmiles222     A1
    #5   Christopher     A9
    #6    Christofer     A6
    

    Here is one way:

    dat[-grep("[Aa][Ll][Ee][Xx]", dat$FirstName), ]
    #     FirstName Letter
    #4 Bobsmiles222     A1
    #5  Christopher     A9
    #6   Christofer     A6
    

    Thanks Ritchie Sacramento for the hint that grep accepts an argument ignore.case. Weird that I did not even notice this argument before. So we can do

    dat[grep("alex", dat$FirstName, ignore.case = TRUE, invert = TRUE), ]
    

    With invert = TRUE, we don't need - before grep for negative indexing. This is safer, in case of no match.