Search code examples
rmatchpartial

R partial match in data frame


How can I address a partial match in a data frame? Lets say this is my df df

   V1  V2  V3 V4
1 ABC 1.2 4.3  A
2 CFS 2.3 1.7  A
3 dgf 1.3 4.4  A

and I want to add a column V5 containing a number 111 only if the value in V1 contains a "f" in the name and a number 222 only if the value in V1 contains a "gf". Will I get problems since several values contain an "f" - or does the order I ender the commands will take care of it?

I tried something like:

df$V5<- ifelse(df$V1 = c("*f","*gf"),c=(111,222) )

but it does not work.

Main problem is how can I tell R to look for "partial match"?

Thanks a million for your help!


Solution

  • Besides the solution setting the values in a sequence for "f", "gf", ... it's worth to have a look at regular expressions capability for zero-width lookahead / lookbehind.

    If you want to grep all rows which contain "f" but not "gf" you can

    v1 <- c("abc", "f", "gf" )
    grep( "(?<![g])f" , v1, perl= TRUE )
    [1] 2
    

    and if you want to grep only those which contain "f" but not "fg"

    v2 <- c("abc", "f", "fg")
    grep( "f(?![g])" , v2, perl= TRUE )
    [1] 2
    

    And of course you can mix that:

    v3 <- c("abc", "f", "fg", "gf")
    grep( "(?<![g])f(?![g])" , v3, perl= TRUE )
    [1] 2
    

    So for your case you can do

    df[ grep( "(?<![g])f" , df$V1, perl= TRUE ), "V5" ] <- 111
    df[ grep( "gf" , df$V1, perl= TRUE ), "V5" ] <- 222