Search code examples
regexpython-rehashtag

How to remove # from hashtag using Python RegEx


My requirement is to remove leading "#" symbol from hashtags in a text. For example, sentence: I'm feeling #blessed. should transform to I'm feeling blessed.

I have written this function, but I'm sure I can achieve the same with a simpler logic in RegEx.

  clean_sentence = ""
  space = " "
  for token in sentence.split():
    if token[0] is '#':
      token = token[1:]
    clean_sentence += token + space
  return clean_sentence

Need help here!!


Solution

  • The regex provided by by @Tim #(\S+) would also match hashtags in non-starting position if they have another non-whitespace character \S behind them, e.g. as in so#blessed.

    We can prevent this by adding a negative lookbehind (?<!\S) before the hash, so that it can't be preceded by anything that is not a whitespace.

    inp = "#I'm #feeling #blessed so#blessed .#here#."
    output = re.sub(r'(?<!\S)#(\S+)', r'\1', inp)
    print(output)
    

    output:

    I'm feeling blessed so#blessed .#here#.