Search code examples
pythonregexpython-re

Python Search word contains : in a string


I try to research if a word exists in a string or not. the problem that the search word contains the character ':'. the search was not successful even if I used the escape. In the example the search for the word 'decision :' return does not exist while the word does exist in the sentence.

Knowing that the search must be exact example: I search the word 'for' it must return me not exist when the sentence contains the word 'formatted'.

import re
texte ="  hello \n a formated test text   \n decision :   repair \n toto \n titi"
word_list = ['decision :', 'for']
def verif_exist (word_list, paragraph):
   
    exist = False
    for word in word_list:
        exp = re.escape(word)
      
        print(exp)
        if re.search(r"\b%s\b" % exp, paragraph, re.IGNORECASE):
            print("From exist, word detected: " + word)
            exist = True
        if exist == True:
            break
    return exist
if verif_exist(word_list, texte):
    print("exist")
else:
    print("not exist") ```

Solution

  • The documentation states: "\b matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters.". There is no word boundary between : and a space because both are not part of a sequence of word characters.

    Maybe you can use either a word boundary or a whitespace in your regular expression.

    import re
    
    texte = "  hello \n a formated test text   \n decision :   repair \n toto \n titi"
    word_list = ['decision :', 'for']
    
    
    def verif_exist(word_list, paragraph):
        for word in word_list:
            exp = re.escape(word)
            print(exp)
            if re.search(fr"\b{exp}(\b|\s)", paragraph, re.IGNORECASE):
                print("From exist, word detected: " + word)
                return True
        return False
    
    
    if verif_exist(word_list, texte):
        print("exist")
    else:
        print("not exist")
    

    That's still not perfect. You might want to take into account what happens if your text ist just 'decision :'. Here we don't have a word boundary and we don't have a whitespace. We'll have to add a check for the end of the text giving us:

        if re.search(fr"\b{exp}(\b|\s|$)", paragraph, re.IGNORECASE):
    

    And now you might have to do something similar to the word boundary at the beginning of your regular expression.