Search code examples
rarraysmaxcharactermin

Extract min and max value from a character variable with R


I have a df with a variable containing multiple charactere as unit and value like below

[525] "8 µg/ml"
[526] "16 µg/ml - 32 µg/ml - 200 µg/ml - 500 µg/ml - 1000 µg/ml"
[527] "5 µg/ml - 10 µg/ml - 250 µg/ml"
[528] "20 µg/ml"
[529] "16 µg/ml"
[530] "60 µg/ml"                                                

I would like to extract two values (min and max) from this variable in two different other variables When only one value is available i would like to implemente min by default I have tried to used str_extracted but i'm sur you will have more valuable advice or solutions Thanks to all of you for your help Best


Solution

  • You can extract all the numbers from the string using str_extract_all and then return min and max value using range.

    mat <- t(sapply(stringr::str_extract_all(x, '\\d+'), function(x) 
                    range(as.numeric(x))))
    mat[mat[, 1] == mat[, 2], 2] <- NA
    mat
    
    #     [,1] [,2]
    #[1,]    8   NA
    #[2,]   16 1000
    #[3,]    5  250
    #[4,]   20   NA
    #[5,]   16   NA
    #[6,]   60   NA
    

    data

    x <- c("8 µg/ml", "16 µg/ml - 32 µg/ml - 200 µg/ml - 500 µg/ml - 1000 µg/ml", 
    "5 µg/ml - 10 µg/ml - 250 µg/ml", "20 µg/ml", "16 µg/ml", "60 µg/ml")