word count in all files using for loop

I want to get word frequency per file in all files in a folder. However, it did not work.

The error was as follows:

C:\Python\Anaconda3\python.exe C:/Python/Anaconda3/frequency.py Traceback (most recent call last): File "C:/Python/Anaconda3/frequency.py", line 6, in for word in file.read().split(): NameError: name 'file' is not defined

Process finished with exit code 1

How can I make it effectively? Thank you.

import glob
import os
path = 'C:\Python\Anaconda3'
for filename in glob.glob(os.path.join(path, '*.txt')):
    wordcount = {}
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1
print(word, wordcount)

Solution

As the code stands, you have three obvious errors (although there may be more).

You have a for loop where you change the name of the iterator

for **filename** in glob.glob(os.path.join(path, '*.txt')):
    ...
    for word in **file**.read.split():
        ...

The wordcount dictionary gets re-initialized (and thus erased) in each iteration of your for loop. You can fix this two ways depending on what you are trying to get at:

a. Move the line wordcount={} to before you start your for loops to prevent clearing out the dictionary after each file. This will give you a total wordcount for all files.

b. Append wordcount to another dictionary files after each iteration of your loop, that way you have a dictionary where the keys are filenames, and the values are dictionaries containing your wordcounts. This can be a bit confusing, because you now have a dictionary of dictionaries. Referencing individual wordcounts becomes filecounts[filename][word] = count.

Your method of printing dictionaries is incorrect, consider the following instead:

for word in wordcount:
    print('{word}:\t{count}'.format(word=word, count=wordcount[word]))

I would also suggest using a default dictionary (see Docs, this would eliminate the need to check if a word is in the dictionary, and set it to 1.

So, in total, I would write it:

from collections import defaultdict
import glob
import os

path = 'C:\Python\Anaconda3'
filecounts = {}

for filename in glob.glob(os.path.join(path, '*.txt')):
    wordcount = defaultdict(int)
    for word in filename.read().split():
        wordcount[word] += 1

    filecounts[filename] = wordcount

for filename in filecounts:
    print('Word count for file \'{file}\''.format(file=filename))
    for word in filecounts[filename]:
        print('\t{word}:\t{count}'.format(word=word, count=filecounts[filename][word]))