my gratitude in advance for any help and apologies for not being able to figure this out from other examples.
I have a vector containing names of files such as: vec = c("Img_1_(set1)_2L4_s.ext", "Img_37_(set19)_2R4_s.ext", "Img_187_(set94)_4L4_s.ext", "Img_77_(set39)_4R2_s.ext")
I want to create two--separate--additional vectors from extracting:
1. The key letter (either L or R) between the numbers that go side-by-side, which vary from case to case. e.g., result: L,R,L,R
2. The "set" string, plus the number--which varies across cases--attached to it between brackets, with and without the brackets. e.g., result1: (set1), (set19), (set94), (set39); result2: set1, set19, set94, set39
Ideally using either stringer(), but I'm open to other --simpler?-- libraries/functions.
For case 1., I tried str_extract(vec, "(?<= \\)_)[0-9]*")
, as a way to get the ")_" pattern followed by a number [0-9] but all I get in return are NAs (I think I'm not quite passing alright the ")" pattern well).
For case 2., I had to made do by simply extracting the set numbers str_extract(vec, "(?<=set)[0-9]*")
, and create another variable by pasting the "set" word; obviously not ideal with large data frames.
The set
pattern is nice and easy, the letters "set"
followed by one more more numbers "[0-9]+"
.
At least for your examples, it seems like the letters L and R don't show up anywhere else, so we can do a very simple pattern for them too, just look for an L or an R: "L|R"
.
set = str_extract(vec, pattern = "set[0-9]+")
main = str_extract(vec, pattern = "L|R")
set
# [1] "set1" "set19" "set94" "set39"
main
# [1] "L" "R" "L" "R"
If you're worried about potentially getting false hits on the L or R because they might show up elsewhere in the input, you could make the pattern more specific, for example looking behind for a number "(?<=[0-9])"
and looking ahead for a number "(?=[0-9])"
:
main2 = str_extract(vec, pattern = "(?<=[0-9])L|R(?=[0-9])")
main2
# [1] "L" "R" "L" "R"
And if you do want the parens with the set, you escape parens to include them in the pattern:
set_with_paren = str_extract(vec, pattern = "\\(set[0-9]+\\)")
set_with_paren
# [1] "(set1)" "(set19)" "(set94)" "(set39)"