Search code examples
rregexstringextractstringr

Return any words beginning with space+ specified letters + a number


How can I return only words matching the pattern of beginning with a space + "t" or "r" directly followed by any number including negative numbers or decimals in R and returning NA where any of these conditions are not met?


Edit including possible decimal numbers

For example:

testvec <- c("random stuff here","words 10293","random t101rando 101 000","r10000","stuff i-10283","word1 t-12.34 stuff rand10293","random100 u-1000"," r10.0 x ","test x-2930"," T r.1234567","testword120num")

Desired result using test data above (testvec):

desired_result <- c(NA,NA,"t101rando",NA,NA,"t-12.34",NA,"r10.0",NA,"r.1234567",NA)

Solution

  • Try:

    regmatches(testvec, regexpr("(?<= )[tr]-?\\.?\\d\\S*", testvec, perl=TRUE), TRUE) <- ""
    testvec
    # [1] ""          ""          "t101rando" ""          ""          "t-12.34"  
    # [7] ""          "r10.0"     ""          "r.1234567" ""         
    

    having "" instead of NA.