I have a tibble, e.g.
a <- as_tibble(c("201.1, 202 (abc) 203, 204", "301 / 302.22 def, 303"))
value
<chr>
1 201.1, 202 (abc) 203, 204
2 301 / 302.22 def, 303
Now I would like to get a data.frame with two columns
[1,] 201.1 202
[2,] 301 302.22
by cutting everything after the second number (202 in the first row, 302.22 in the second row) and separating the remining part of the expression with delimiter "," or "/" to get the two columns.
Here are several approaches.
1) separate Use separate
from tidyr like this giving the tibble/data.frame shown. It automatically determines that the columns are numeric.
library(tidyr)
a %>%
separate("value", c("value1", "value2"), sep = "[,/ ]+",
extra = "drop", convert = TRUE)
## # A tibble: 2 × 2
## value1 value2
## <dbl> <dbl>
## 1 201. 202
## 2 301 302.
2) Base R Use strcapture
from base R like this. No packages are needed.
strcapture("([0-9.]+)[^0-9.]+([0-9.]+).*", a$value,
data.frame(value1 = numeric(0), value2 = numeric(0)))
## value1 value2
## 1 201.1 202.00
## 2 301.0 302.22
3) read.pattern Use read.pattern
from gsubfn. This uses the same regex as in (2). It automatically determines that the columns are numeric and uses the same text=
and col.names=
arguments as read.table
making them easy to remember if you are familiar with that.
library(gsubfn)
read.pattern(text = a$value, pattern = "([0-9.]+)[^0-9.]+([0-9.]+).*",
col.names = c("value1", "value2"))
## value1 value2
## 1 201.1 202.00
## 2 301.0 302.22
The input from the question
library(tibble)
a <- as_tibble(c("201.1, 202 (abc) 203, 204", "301 / 302.22 def, 303"))