Search code examples
pythonpython-3.xstringtxt

Searching in txt files. all files without a match


I have a list containing paths to .txt files in a folder that I would like to search. I am looking for specific phrases different for each file. I would like to categorize them all so I search the known categories and display the file if there is a match. Unfortunately I have no idea how to print the files if there is no match.

I'd like to print files without match.

FList = [] #list which contains path to .txt files

#searching phrases
for i in Flist:
   with open(i) as txtfile:
      if 'word1' in txtfile.read():
         print("Category I")
    with open(i) as txtfile:
      if 'word2' in txtfile.read():
         print("Category II")
    with open(i) as txtfile:
      if 'word3' in txtfile.read():
         print("Category III") `

I'd try resolve problem with logical operators:

with open(i) as txtfile:
    if not(('word1') or ('word2') or ('word3')) in txtfile.read():
       print(i)

but in this case string which is searching is always the first one (In this example it's the word1 )

I know that isn't the most efficient code but thats not a point of this.


Solution

  • ... I would like to categorize them all so I search the known categories and display the file if there is a match ...

    You can try something simple like below :

    dmap = {
        "word1": "Category I",
        "word2": "Category II",
        "word3": "Category III",
        # ... add here more mappings 
    }
    
    out = {}
    
    for fp in FList:
        categories = []
        with open(fp) as tf:
            data = tf.read()
            for word, cat in dmap.items():
                if word in data:
                    categories.append(cat)
    
        out[fp] = categories if categories else "No Match"
    

    The output should be like this one :

    {
        'tmp/file1.txt': 'No Match',
        'tmp/file2.txt': ['Category II'],
        'tmp/file3.txt': 'No Match',
        ...
        'tmp/file7.txt': 'No Match',
        'tmp/file8.txt': ['Category I', 'Category II', 'Category III'],
        'tmp/file9.txt': ['Category III']
    }
    

    You can then filter this dictionnary to return the filepaths with or without a match :

    >>> [fp for (fp, m) in out.items() if m != "No Match"] # files with at least a match
    
    >>> [fp for (fp, m) in out.items() if m == "No Match"] # files with no match