I have a column that I am trying to break up into two while retaining the delimiter. I got this far, but part of the delimiter is being dropped. I also need to do this split a second time, adding the delimiter to the first column which I cannot figure out how to do.
duplicates <- data.frame(sample = c("a_1_b1", "a1_2_b1", "a1_c_1_b2"))
duplicates <- separate(duplicates,
sample,
into = c("strain", "sample"),
sep = "_(?=[:digit:])")
using only the first name as an example, my output is a_1
and b1
while my desired output is a_1
and _b1
.
I would also like to perform this split with the delimiter added to the first column as below.
sample | batch |
---|---|
a_1_ | b1 |
a1_2_ | b1 |
a1_c_1_ | b2 |
Edit: This post does not answer my question of how to retain the delimiter, or to control which side of the split it ends up on.
You can use tidyr::extract
with capture groups.
tidyr::extract(duplicates, sample, c("strain", "sample"), '(.*_)(\\w+)')
# strain sample
#1 a_1_ b1
#2 a1_2_ b1
#3 a1_c_1_ b2
The same regex can also be used with strcapture
in base R -
strcapture('(.*_)(\\w+)', duplicates$sample,
proto = list(strain = character(), sample = character()))