Search code examples
rregexstringr

transform a string that contains positive/negative numbers with decimals into numeric in R


I have the following vector:

x = c("=-1000.51$^{*}$", "=.0038374", "=-2.91823e-09", "=7.290392e-09","=2.254938e-08$^{**}$") 

what is the fastest way to get transform this vector in a numeric vector like:

x = c(-1000.51, .0038374, -2.91823e-09,7.290392e-09,2.254938e-08) 

using R?

I tried

x %>% str_match_all("[0-9]+") %>% unlist %>% as.numeric

but it doesn't work as I hoped. Thank you for your kind help.


Solution

  • Here are a few one-liners. The last three are taken from the comments. Note that edge cases may behave differently for different solutions. For example, the first two allow digits within the junk but the third one does not (however that does not occur in the question example so it may not matter). The fourth one assumes that the junk at the end starts with $ (which is always the case in the question example).

    as.numeric(trimws(trimws(x, "left", "="), "right", "\\D"))
    
    x |> trimws("left", "=") |> trimws("right", "\\D") |> as.numeric()
    
    as.numeric(trimws(x, "both", "[^-0-9]"))
    
    read.table(text = sub("=", "", x), comment.char = "$")[[1]]
    
    library(gsubfn)
    strapply(x, "-?\\d.*\\d", as.numeric, simplify = TRUE)
    
    library(readr)
    parse_number(x)
    
    as.numeric(gsub("[^-+0-9.e]", "", x))
    
    library(stringr)
    str_match_all(x, "[-+0-9.e]+") |> unlist() |> as.numeric()