Search code examples
rregexpcreregex-lookarounds

Find strings in R with this specific pattern "digitsXdigits"


I'm trying to clean a list of strings by finding strings with a particular pattern, but do not know how to write the regex to find them.

I am using grepl(), but do not know how to define the pattern.

The pattern is digits then [must include x, maybe special characters, letter] then digits again.

Here are some examples:           OUTPUT from grepl()
"kills kld ldks 2087x-2714"     TRUE
"sdlsn dklsk 4.75x25"           TRUE
"dkks klsdk  3x4x135"           TRUE
"djnlsdkl250shd"                FALSE
"kdls, skfndkl 24gx.75"         TRUE
"ski lsdkcm lskd 12.6"          FALSE
"klslc ksldml 3.0 dnjsl 67n030" FALSE

It's a little bit of a complicated pattern. Basically it must include digits on both sides of the x, but can also have special characters and numbers in the mix.


Solution

  • Using str_detect from the stringr package. I've added two additional test strings at the end of x.

    The pattern is: a digit, zero or 1 occurrence of something that isn't a space, an x, zero or 1 occurrence of something that isn't a space, a digit

    x <- c("kills kld ldks 2087x-2714",
           "sdlsn dklsk 4.75x25",
           "dkks klsdk  3x4x135",
           "djnlsdkl250shd",
           "kdls, skfndkl 24gx.75",
           "ski lsdkcm lskd 12.6",
           "klslc ksldml 3.0 dnjsl 67n030",
           "5x25",
           "kdls skfndkl x24g.75")
    
    str_detect(x, "\\d\\S?x\\S?\\d")
    
    #[1]  TRUE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE