Search code examples
pythondocx

Matching strings with multiple lists for a search function


I have a search function that looks for a string in a .docx file. I want to add a split function: search_terms = x.split() so I can have multiple search terms.

Right now, if I search for two separate terms, the function will interpret the string as one term and look for that one term in the documents.

The split function takes care of separating the terms into different strings, but then I'm not sure how to associate the file names, text in the files, and the strings. Any suggestions would be much appreciated!

import os
import docx2txt

os.chdir('c:/user/path/to/files')

path = ('c:/user/path/to/files')

files = []

x = str(input("search: "))

for file in os.listdir(path):
    if file.endswith('.docx'):
        files.append(file)

for i in range(len(files)):
    text = docx2txt.process(files[i])
    if x.upper() in text.upper() or x.lower() in text.lower():
        print (files[i])


Solution

  • Try the following:

    import os
    import docx2txt
    
    os.chdir('c:/user/path/to/files')
    
    path = ('c:/user/path/to/files')
    
    files = [f for f in os.listdir(path) if f.endswith('.docx')]
    
    search_terms = str(input("search: ")).split()
    
    for file in files:
        text = docx2txt.process(file)
        if any(x.upper() in text.upper() for x in search_terms):
            print (file)
    

    IMHO Suggested fixes:

    • I've removed or x.lower() in text.lower() because it's redundant.
    • If any x in search_terms (a list of terms to look for) appears in text, it will be matched.
    • You could build files list in 1 line using list-comprehension [f for f in os.listdir(path) if f.endswith('.docx')]
    • As files is an iterable, there's no need to use range built-in in for loop.

    Let me know if it works!