Search code examples
rregexstringtext-extractiondata-extraction

str_extract_all with decimal numbers


I have this dataframe (DF1)

structure(list(ID = 1:3, Temperature = c("temp 37.8 37.6", "37,8 was body temperature", "110 kg and 38 temp")), class = "data.frame", row.names = c(NA, -3L)) 

ID Temperature
1  "temp 37.8 37.6"
2  "37,8 was body temperature"
3  "110 kg and 38 temp"

And this Pattern

Pattern <- paste(c("temp", "Temperature"),collapse="|") 

And I would like to have a new column that contains a number string with decimal number. Decimal characters are both "," and ".".

So I would like to get this

ID Temperature                  Number
1  "temp 37.8 37.6"             c(37.8,37.6)
2  "37,8 was body temperature"  37,8
3  "110 kg and 38 temp"         c(110, 38)

I have tried this

mutate(Number = ifelse(grepl(Pattern, Temperature), str_extract_all(Temperature, "\\s(.*[0-9])$ | \\s(,*[0-9])$"), "no"))

But this regex gives me only an empty string.


Solution

  • You can use :

    stringr::str_extract_all(DF1$Temperature, '\\d+([.,]\\d+)?')
    
    #[[1]]
    #[1] "37.8" "37.6"
    
    #[[2]]
    #[1] "37,8"
    
    #[[3]]
    #[1] "110" "38" 
    

    where :

    \\d+ - one or more digit followed by

    an optional

    [.,] dot or comma

    \\d+ - one or more digit.