Search code examples
rtidyversesubsettidyrgrepl

Filter & Subset if a String Contains Certain Characters at specific position (in R)


I currently wish to subset a data frame if it contains any numbers from 01 to 12 at 11-12 position (if we also consider - as a character then the position will be 14-15th position). I tried grepl but was not able to do it successfully.

Data sample:

x <- data.table(c('ACCN-NJ-A55O-01A-11D-A25L-08','ACCN-NJ-A55O-11D-11D-A25L-08', 'ACCN-05-4249-01A-01D-1105-08', 'ACCN-S2-AA1A-15C-12D-A397-08'))

Expected Output (row number 1, 2 and 3 will returned):

ACCN-NJ-A55O-01A-11D-A25L-08
ACCN-NJ-A55O-11D-11D-A25L-08
ACCN-05-4249-01A-01D-1105-08

Any help would be appreciated. Thanks in advance


Solution

  • If the position is fixed you can use substr/substring to extract string at specific position.

    subset(x, as.integer(substr(V1, 14, 15)) <= 12)
    
    #                             V1
    #1: ACCN-NJ-A55O-01A-11D-A25L-08
    #2: ACCN-NJ-A55O-11D-11D-A25L-08
    #3: ACCN-05-4249-01A-01D-1105-08
    

    Using dplyr -

    library(dplyr)
    x %>% filter(between(as.integer(substr(V1, 14, 15)), 1, 12))