Search code examples
rstringsubstr

How to subset a string in R


Dear all I have a vector of strings like:

LOCAT01PE
WECAT013EJD
AFECAT0155DR

I want to subset each value obtain only CAT and all the number after:

CAT01
CAT013
CAT0155

I have tried to use the command substr but it won't work since the quantity before the word CAT is not fixed and the numbers after CAT are not fixed.


Solution

  • We can use regexpr/regmatches in base R. It matches the word 'CAT' followed by - if there is any ? and one or more digits (\\d+)

    regmatches(x, regexpr("CAT-?\\d+", x))
    #[1] "CAT01"    "CAT013"   "CAT0155"  "CAT-01"   "CAT-013"  "CAT-0155"
    

    data

    x <- c('LOCAT01PE', 'WECAT013EJD', 'AFECAT0155DR', 
        'LO-CAT-01PE', 'WE-CAT-013-EJD', 'AFE-CAT-0155-DR')