Search code examples
rnumbersextractreadr

readr::parse_number with leading zero


I would like to parse numbers that have a leading zero.

I tried readr::parse_number, however, it omits the leading zero.

library(readr)

parse_number("thankyouverymuch02")
#> [1] 2

Created on 2022-12-30 with reprex v2.0.2

The desired output would be 02


Solution

  • The simplest and most naive would be:

    gsub("\\D", "", "thankyouverymuch02")
    [1] "02"
    

    The regex special "\\d" matches a single 0-9 character only; the inverse is "\\D" which matches a single character that is anything except 0-9.

    If you have strings with multiple patches of numbers and you want them to be distinct, neither parse_number nor this simple gsub is going to work.

    gsub("\\D", "", vec)
    # [1] "02"   "0302"
    

    For that, it must always return a list (since we don't necessarily know a priori how may elements have 0, 1 or more number-groups).

    vec <- c("thankyouverymuch02", "thank03youverymuch02")
    regmatches(vec, gregexpr("\\d+", vec))
    # [[1]]
    # [1] "02"
    # [[2]]
    # [1] "03" "02"
    
    #### equivalently
    stringr::str_extract_all(vec, "\\d+")
    # [[1]]
    # [1] "02"
    # [[2]]
    # [1] "03" "02"