I would like to make grep function with python-docx.
from docx import Document
files = glob.glob("Folders/*.docx")
fetchWord = "sample"
for file in files:
document = Document(file)
count = 0
for para in document.paragraphs:
if para.text.find(fetchWord) > -1:
print(file + ":" + "Line" + str(count) + ":" + para.text)
With this code, I can grep only "sample", but not grep "Sample", "sAmPle" and so on.
For grep these words, I would like to implement re.IGNORECASE method in the above code. How do I do this?
You can use Python's re
module if you want to use re.IGNORECASE
. If you need to use a regex, then this is the way to go. You can do so like this:
import re
from docx import Document
files = glob.glob("Folders/*.docx")
fetchWord = "sample"
for file in files:
document = Document(file)
count = 0
for para in document.paragraphs:
if re.match(fetchWord, para.text, re.IGNORECASE) != None:
print(file + ":" + "Line" + str(count) + ":" + para.text)
However, if you simply want to search for text and do not need to use a regex, you can use the .lower()
method to convert the paragraph to lowercase. Like this:
from docx import Document
files = glob.glob("Folders/*.docx")
fetchWord = "sample"
for file in files:
document = Document(file)
count = 0
for para in document.paragraphs:
if para.text.lower().find(fetchWord) > -1:
print(file + ":" + "Line" + str(count) + ":" + para.text)