Search code examples
pythonregexp-replace

Delete or replace all character inside # in python


I want to remove all character inside of # in text. This is the regex I used.

text = text.replace('\xa0', ' ')
text = re.sub(r"#\ [A-Za-z]*\# ", " ", text)
text = re.sub(r"#[A-Za-z]*\# ", " ", text)

example of data like:

'Please direct all communications to the HR Department within Refined Resources (#URL_80d75e0d07ca8b108539318a0443bfe5d1ff472afa0c4540b77079c5d5f31eee#)\xa0#EMAIL_0b13a2cfd4718ce252c09b2353d692a73bd32552e922c5db6cad5fb7e9a2c6c3#Darren Lawson | VP of Recruiting |\xa0#EMAIL_395225df8eed70288fc67310349d63d49d5f2ca6bc14dbb5dcbf9296069ad88c#\xa0| #PHONE_70128aad0c118273b0c2198a08d528591b932924e165b6a8d1272a6f9e2763d1#'

but nothing got replace except \xa0 go replace with " ". But how make output like:

'Please direct all communications to the HR Department within Refined Resources ()  Darren Lawson | VP of Recruiting | | '

I used "# \S+ \#" ; ""but nothing happen to. How I get to replaced all character inside the hashtag.


Solution

  • You could write a single sub with an alternation and matching all the allowed characters between the #

    import re
    
    text = 'Please direct all communications to the HR Department within Refined Resources (#URL_80d75e0d07ca8b108539318a0443bfe5d1ff472afa0c4540b77079c5d5f31eee#)\xa0#EMAIL_0b13a2cfd4718ce252c09b2353d692a73bd32552e922c5db6cad5fb7e9a2c6c3#Darren Lawson | VP of Recruiting |\xa0#EMAIL_395225df8eed70288fc67310349d63d49d5f2ca6bc14dbb5dcbf9296069ad88c#\xa0| #PHONE_70128aad0c118273b0c2198a08d528591b932924e165b6a8d1272a6f9e2763d1#'
    text = re.sub(r"#[A-Za-z0-9_]*#\s*|\xa0", " ", text)
    print(text)
    

    Output

    Please direct all communications to the HR Department within Refined Resources ( )  Darren Lawson | VP of Recruiting |  |