Search code examples
rregexnumbers

Extract valid numbers from character vector in R


Suppose I have the below character vector

c("hi", "4", "-21", "6.5", "7. 5", "-2.2", "4h")

Now I want to extract only valid numbers which are in the above vector:

c("4", "-21", "6.5", "-2.2")

note: one space in between . and 5 in 7. 5 so not a valid number.

I was trying with regex /^-?(0|[1-9]\\d*)(\\.\\d+)?$/ which is given here but no luck.

So what would be the regex to extract valid numbers from a character vector?


Solution

  • We can use grep that matches digits with . from the start (^) till the end ($) of the string

    grep("^-?[0-9.]+$", v1, value = TRUE)
    [1] "4"    "-21"  "6.5"  "-2.2"
    

    Or for fringe cases

    grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1"), value = TRUE)
    [1] "4"    "-21"  "6.5"  "-2.2"
    
    grep("^[ -]?[0-9]+(\\.\\d+)?$", c(v1, "4.1.1", " 2.9"), value = TRUE)
    [1] "4"    "-21"  "6.5"  "-2.2" " 2.9"