Using regex I am trying to only keep hashtags in the text. I am trying to match everything else and replace with not significant group. But maybe there is a smarter approach.
Example text:
This is a #text, which is #full of #hashtags.
Well, this is not #easy to
#extract #them.
I think I #start to lose: #hope.
My best try: ([\s\.\,]|^)[^#]\w+([\s\.\,]*?|$)
Replacing with $2
returns
a #text #full #hashtags #easy
#extract #them. I #start: #hope.
The expected result should have 4 rows, as in the example. Spaces can also stay.
Ideal desired result:
#text #full #hashtags
#easy
#extract #them
#start #hope
If you don't mind keeping the leading spaces (which you might trim afterwards), you could use group 1 in the replacement and match:
.*?(\s*#\w+)|.+
The pattern matches:
.*?
Match any chat except a newline as least as possible(\s*#\w+)
Capture in group 1 matching optional whitespace chars, then #
and 1+ word chars|
Or.+
Match 1+ times any char except a newlineIf you don't want to keep the leading spaces but don't mind trailing spaces, you can use group 1 followed by a space in the replacement and match:
.*?(#\w+)|.+