I am trying to remove hashtags from beginning of strings in R. For example:
x<- "I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
I want to remove the hashtags at the end of string which are #lateNightThoughts and #movie. Result:
- "I didn't know it could be #boring. guess I need some fun"
I tried :
stringi::stri_replace_last_regex(x,'#\\S+',"")
but it removes only the very last hashtag.
- "I didn't know it could be #boring. guess I need some fun #movie "
Any idea how to get the expected result?
Edit:
How about removing hashtag from beginning of text ? eg:
x<- "#Thomas20 I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
You may use
> x<- "I didn't know it could be #boring. guess I need some fun #movie #lateNightThoughts"
> sub("\\s*\\B#\\w+(?:\\s*#\\w+)*\\s*$", "", x)
[1] "I didn't know it could be #boring. guess I need some fun"
Or, if you do not care about the context of the first #
you want to start matching from, you may even use
sub("(?:\\s*#\\w+)+\\s*$", "", x)
See the regex demo.
Details
\s*
- zero or more whitespaces\B
- right before the current location, there can be start of string or a non-word char (this is usually used to ensure you do not match #
inside a "word", so if you do not need it, you may remove this non-word boundary)#
- a #
char\w+
- 1 or more word chars (letters, digits or _
)(?:\s*#\w+)*
- zero or more occurrences of:
\s*
- zero or more whitespaces#
- a #
char\w+
- 1+ word chars\s*
- zero or more whitespaces$
- end of string.