Search code examples
regexhashtagregexp-replace

Regex to only keep hashtags


Using regex I am trying to only keep hashtags in the text. I am trying to match everything else and replace with not significant group. But maybe there is a smarter approach.

Example text:

This is a #text, which is #full of #hashtags.
Well, this is not #easy to
#extract #them. 
I think I #start to lose: #hope.

My best try: ([\s\.\,]|^)[^#]\w+([\s\.\,]*?|$)

Replacing with $2 returns

 a #text #full #hashtags #easy
#extract #them. I #start: #hope.

The expected result should have 4 rows, as in the example. Spaces can also stay.
Ideal desired result:

#text #full #hashtags
#easy
#extract #them
#start #hope

Demo


Solution

  • If you don't mind keeping the leading spaces (which you might trim afterwards), you could use group 1 in the replacement and match:

    .*?(\s*#\w+)|.+
    

    The pattern matches:

    • .*? Match any chat except a newline as least as possible
    • (\s*#\w+) Capture in group 1 matching optional whitespace chars, then # and 1+ word chars
    • | Or
    • .+ Match 1+ times any char except a newline

    Regex demo

    If you don't want to keep the leading spaces but don't mind trailing spaces, you can use group 1 followed by a space in the replacement and match:

    .*?(#\w+)|.+
    

    Regex demo