I have a list
containing paths to .txt
files in a folder that I would like to search. I am looking for specific phrases different for each file. I would like to categorize them all so I search the known categories and display the file if there is a match. Unfortunately I have no idea how to print the files if there is no match.
I'd like to print files without match.
FList = [] #list which contains path to .txt files
#searching phrases
for i in Flist:
with open(i) as txtfile:
if 'word1' in txtfile.read():
print("Category I")
with open(i) as txtfile:
if 'word2' in txtfile.read():
print("Category II")
with open(i) as txtfile:
if 'word3' in txtfile.read():
print("Category III") `
I'd try resolve problem with logical operators:
with open(i) as txtfile:
if not(('word1') or ('word2') or ('word3')) in txtfile.read():
print(i)
but in this case string which is searching is always the first one (In this example it's the word1
)
I know that isn't the most efficient code but thats not a point of this.
... I would like to categorize them all so I search the known categories and display the file if there is a match ...
You can try something simple like below :
dmap = {
"word1": "Category I",
"word2": "Category II",
"word3": "Category III",
# ... add here more mappings
}
out = {}
for fp in FList:
categories = []
with open(fp) as tf:
data = tf.read()
for word, cat in dmap.items():
if word in data:
categories.append(cat)
out[fp] = categories if categories else "No Match"
The out
put should be like this one :
{
'tmp/file1.txt': 'No Match',
'tmp/file2.txt': ['Category II'],
'tmp/file3.txt': 'No Match',
...
'tmp/file7.txt': 'No Match',
'tmp/file8.txt': ['Category I', 'Category II', 'Category III'],
'tmp/file9.txt': ['Category III']
}
You can then filter this dictionnary to return the filepaths with or without a match :
>>> [fp for (fp, m) in out.items() if m != "No Match"] # files with at least a match
>>> [fp for (fp, m) in out.items() if m == "No Match"] # files with no match