Search code examples
regexstringpython-3.xpattern-matchingnon-alphanumeric

python 3 regex string matching ignore whitespace and string.punctuation


I am new to regex and would like to know how to pattern match two strings. The use case would be something like finding a certain phrase in some text. I'm using python 3.7 if that makes a difference.

phrase = "some phrase" #the phrase I'm searching for

Possible matches:

text = "some#@$#phrase"
            ^^^^ #non-alphanumeric can be treated like a single space
text = "some   phrase"
text = "!!!some!!! phrase!!!"

These are not matches:

text = "some phrases"
                   ^ #the 's' on the end makes it false
text = "ssome phrase"
text = "some other phrase"

I have tried using something like:

re.search(r'\b'+phrase+'\b', text)

I would very much appreciate an explanation of why the regex works if you provide a valid solution.


Solution

  • You should use something like this:

    re.search(r'\bsome\W+phrase\b', text)
    
    • '\W' means non-word character

    • '+' means one or more times

    In case you have a given phrase in a variable, you could try this before:

    some_phrase = some_phrase.replace(r' ', r'\W+')