Search code examples
pythonpython-3.xdataframecomparison

How to make file comparison with other files in Python


I am new to python :( I want to make:

Main file (tokens): beautiful 2 amazing 5 speechless 2

Folder with 73 files:

How can I write the script in python to check the source frequency for example: The main folder the words to calculate in which sources appear: The results for example The word beautiful appears in 55 sources The word amazing appears in 30 sources The word speechless appears in 73 sources

from os import listdir

with open("C:/Users/ell/Desktop/Archivess/test/rez.txt", "w") as f:
    for filename in listdir("C:/Users/ell/Desktop/Archivess/test/sources/books/"):
       with open('C:/Users/ell/Desktop/Archivess/test/freqs/books/' + filename) as currentFile:
            text = currentFile.read()

            if ('amazing' in text):
                f.write('The word excist in the file ' + filename[:-4] + '\n')
            else:
                f.write('The word do not excist in the file' + filename[:-4] + '\n')

I have written the code but only shows me the word that I write in for loop. How can I do this code for files? I appreciate any help.


Solution

  • You can do it this way. After creating token list, go through each file and increment the count of token if it exists in the file.

    import os
    
    token_file = "token.txt"
    main_dir = "PATH/TO/DIR"
    with open(token_file, "r") as f:
        # Creates a dict with "token" as keys and 0 as values.
        # rstrip removes \n from token
        tokens = {token.rstrip(): 0 for token in f.readlines()}
    
    for filename in os.listdir(main_dir): 
        path = os.path.join(main_dir, filename) # path to source file
        with open(path, "r") as fp:
            text = fp.read()
            for token in tokens.keys(): # check every token
                if token in text:       # if token found in text
                    tokens[token] += 1  # increment token count
    print(tokens)