Search code examples
rdelimiterstrsplit

Seperate two different delimiter and cut off


I have a tibble, e.g.

a <- as_tibble(c("201.1, 202 (abc) 203, 204", "301 / 302.22 def, 303"))

  value                  
  <chr>                  
1 201.1, 202 (abc) 203, 204
2 301 / 302.22 def, 303    

Now I would like to get a data.frame with two columns

[1,] 201.1  202
[2,] 301    302.22

by cutting everything after the second number (202 in the first row, 302.22 in the second row) and separating the remining part of the expression with delimiter "," or "/" to get the two columns.


Solution

  • Here are several approaches.

    1) separate Use separate from tidyr like this giving the tibble/data.frame shown. It automatically determines that the columns are numeric.

    library(tidyr)
    
    a %>%
      separate("value", c("value1", "value2"), sep = "[,/ ]+",
        extra = "drop", convert = TRUE)
    
    ## # A tibble: 2 × 2
    ##   value1 value2
    ##    <dbl>  <dbl>
    ## 1   201.   202 
    ## 2   301    302.
    

    2) Base R Use strcapture from base R like this. No packages are needed.

    strcapture("([0-9.]+)[^0-9.]+([0-9.]+).*", a$value, 
      data.frame(value1 = numeric(0), value2 = numeric(0)))
    
    ##   value1 value2
    ## 1  201.1 202.00
    ## 2  301.0 302.22
    

    3) read.pattern Use read.pattern from gsubfn. This uses the same regex as in (2). It automatically determines that the columns are numeric and uses the same text= and col.names= arguments as read.table making them easy to remember if you are familiar with that.

    library(gsubfn)
    
    read.pattern(text = a$value, pattern = "([0-9.]+)[^0-9.]+([0-9.]+).*", 
      col.names = c("value1", "value2"))
    
    ##   value1 value2
    ## 1  201.1 202.00
    ## 2  301.0 302.22
    

    Note

    The input from the question

    library(tibble)
    a <- as_tibble(c("201.1, 202 (abc) 203, 204", "301 / 302.22 def, 303"))