Search code examples
pythonregexpreprocessor

removing digits with words using regular expressions not working as expected


import re
text = """Why is this $[...] when the same product is available for $[...] here?<br />
http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY<br /><br />
The Victor M380 and M502 traps are unreal, of course -- total fly genocide. 
Pretty stinky, but only right nearby. won't, can't iamwordwith4number 234f  ther was a word withnumber before me"""

sentense1 = re.sub(r"\S*\d+\S*", "", text)  # removes words which has digits in it.
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text)  # removes punctuations.
print(sentense1)

I am trying to remove words which have numbers in it. example in above sentence we have words like : iamwordwith4number or 234f. So I wanted to remove them. it is working if I comment second regular expression line. I'm not sure if there is dependency with that. can you please advise me on this?


Solution

  • Your second regular expression should be like this:

    sentense1 = re.sub('[^A-Za-z0-9]+', " ", sentense1)  # removes punctuations.
    

    Instead of this:

    sentense1 = re.sub('[^A-Za-z0-9]+', " ", text)  # removes punctuations.