I want to remove all character inside of # in text. This is the regex I used.
text = text.replace('\xa0', ' ')
text = re.sub(r"#\ [A-Za-z]*\# ", " ", text)
text = re.sub(r"#[A-Za-z]*\# ", " ", text)
example of data like:
'Please direct all communications to the HR Department within Refined Resources (#URL_80d75e0d07ca8b108539318a0443bfe5d1ff472afa0c4540b77079c5d5f31eee#)\xa0#EMAIL_0b13a2cfd4718ce252c09b2353d692a73bd32552e922c5db6cad5fb7e9a2c6c3#Darren Lawson | VP of Recruiting |\xa0#EMAIL_395225df8eed70288fc67310349d63d49d5f2ca6bc14dbb5dcbf9296069ad88c#\xa0| #PHONE_70128aad0c118273b0c2198a08d528591b932924e165b6a8d1272a6f9e2763d1#'
but nothing got replace except \xa0 go replace with " ". But how make output like:
'Please direct all communications to the HR Department within Refined Resources () Darren Lawson | VP of Recruiting | | '
I used "# \S+ \#"
; ""but nothing happen to. How I get to replaced all character inside the hashtag.
You could write a single sub
with an alternation and matching all the allowed characters between the #
import re
text = 'Please direct all communications to the HR Department within Refined Resources (#URL_80d75e0d07ca8b108539318a0443bfe5d1ff472afa0c4540b77079c5d5f31eee#)\xa0#EMAIL_0b13a2cfd4718ce252c09b2353d692a73bd32552e922c5db6cad5fb7e9a2c6c3#Darren Lawson | VP of Recruiting |\xa0#EMAIL_395225df8eed70288fc67310349d63d49d5f2ca6bc14dbb5dcbf9296069ad88c#\xa0| #PHONE_70128aad0c118273b0c2198a08d528591b932924e165b6a8d1272a6f9e2763d1#'
text = re.sub(r"#[A-Za-z0-9_]*#\s*|\xa0", " ", text)
print(text)
Output
Please direct all communications to the HR Department within Refined Resources ( ) Darren Lawson | VP of Recruiting | |