I have a search function that looks for a string in a .docx file. I want to add a split function:
search_terms = x.split()
so I can have multiple search terms.
Right now, if I search for two separate terms, the function will interpret the string as one term and look for that one term in the documents.
The split function takes care of separating the terms into different strings, but then I'm not sure how to associate the file names, text in the files, and the strings. Any suggestions would be much appreciated!
import os
import docx2txt
os.chdir('c:/user/path/to/files')
path = ('c:/user/path/to/files')
files = []
x = str(input("search: "))
for file in os.listdir(path):
if file.endswith('.docx'):
files.append(file)
for i in range(len(files)):
text = docx2txt.process(files[i])
if x.upper() in text.upper() or x.lower() in text.lower():
print (files[i])
Try the following:
import os
import docx2txt
os.chdir('c:/user/path/to/files')
path = ('c:/user/path/to/files')
files = [f for f in os.listdir(path) if f.endswith('.docx')]
search_terms = str(input("search: ")).split()
for file in files:
text = docx2txt.process(file)
if any(x.upper() in text.upper() for x in search_terms):
print (file)
IMHO Suggested fixes:
or x.lower() in text.lower()
because it's redundant.x
in search_terms
(a list of terms to look for) appears in text
, it will be matched.[f for f in os.listdir(path) if f.endswith('.docx')]
for
loop.Let me know if it works!