I have this challenge:
I want to be able to extract one portion of a string in the following manner:
test<-c("This_This-This.Not This",
"This_This-This.not_.this",
"This_This-This",
"this",
"this.Not This")
Since I need to use a regex, I have been trying to use this expression:
str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
but what I get is:
> str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
[1] "This_This-This.Not This" "This_This-This.not_this"
[3] "This_This-This" "this"
[5] "this.Not This"
>
My desired output is:
"This_This-This"
"This_This-This"
"This_This-This"
"this"
"this"
This is my thought process behind the regex
str_match(test,"(^[a-zA-Z].+)[\\.\\b]?")[,2]
(^[a-zA-Z].+)= this to capture the group before the dot since the string starts always with a letter cpas or lower case, and all other strings after that that's why the .+
[\.\b]?=a dot or a world boundary that may or may not be that's why the ?
This is not giving what I want. Where is my mistake?
My regex is "match anything up to either a dot or the end of the line".
library(stringr)
str_match(test, "^(.*?)(\\.|$)")[, 2]
Result:
[1] "This_This-This" "This_This-This" "This_This-This" "this" "this"