Search code examples
python-docx

Open docx documents from list of files


I have a long list of docx files stored in several directories and I need to search for documents containing specific strings in those files.

I have all files in a list like this one (coming from a pandas df column):

files=['C:\AAA\BBB\file1.docx','G:\CCC\DDD\file2.docx'...]

I also have a list of strings like this one: strings=['hksdhus','jshaòohse','iueoiwu']

Here is my code:

    for string in strings:
        for file in files:
             doc=docx.Document(file)
             for para in doc.paragraphs:
                 if string in para.text:
                     print(string+' is in '+file)
                     break
                 else:
                     print(string+' not found')
                     break

It gives me always "string not found" because, I think, it's not reading the file at all; if I try print(para.text) it turns me blank.

Can anyone help me with this?

Thanks in advance for any suggestion


Solution

  • Your code is only checking the first paragraph because you have a break after the first paragraph is checked no matter whether the string is found or not.

    Remove the 3-line else block and try again. You could just remove the break on the last line, but that's going to print a string not found message for every paragraph in the document if you leave it in there.