Search code examples
pythonpython-repython-docx

How to implement re.IGNORECASE method in grep of python-docx


I would like to make grep function with python-docx.

from docx import Document

files = glob.glob("Folders/*.docx")
fetchWord = "sample"

for file in files:
    document = Document(file)
    count = 0
    for para in document.paragraphs:
        
        if para.text.find(fetchWord) > -1:
            print(file + ":" + "Line" + str(count) + ":" + para.text)

With this code, I can grep only "sample", but not grep "Sample", "sAmPle" and so on.

For grep these words, I would like to implement re.IGNORECASE method in the above code. How do I do this?


Solution

  • You can use Python's re module if you want to use re.IGNORECASE. If you need to use a regex, then this is the way to go. You can do so like this:

    import re
    from docx import Document
    
    files = glob.glob("Folders/*.docx")
    fetchWord = "sample"
    
    for file in files:
        document = Document(file)
        count = 0
        for para in document.paragraphs:
            if re.match(fetchWord, para.text, re.IGNORECASE) != None:
                print(file + ":" + "Line" + str(count) + ":" + para.text)
    

    However, if you simply want to search for text and do not need to use a regex, you can use the .lower() method to convert the paragraph to lowercase. Like this:

    from docx import Document
    
    files = glob.glob("Folders/*.docx")
    fetchWord = "sample"
    
    for file in files:
        document = Document(file)
        count = 0
        for para in document.paragraphs:
            
            if para.text.lower().find(fetchWord) > -1:
                print(file + ":" + "Line" + str(count) + ":" + para.text)