Search code examples
pythonfor-loopword-count

word count in all files using for loop


I want to get word frequency per file in all files in a folder. However, it did not work.

The error was as follows:

C:\Python\Anaconda3\python.exe C:/Python/Anaconda3/frequency.py Traceback (most recent call last): File "C:/Python/Anaconda3/frequency.py", line 6, in for word in file.read().split(): NameError: name 'file' is not defined

Process finished with exit code 1

How can I make it effectively? Thank you.

import glob
import os
path = 'C:\Python\Anaconda3'
for filename in glob.glob(os.path.join(path, '*.txt')):
    wordcount = {}
    for word in file.read().split():
        if word not in wordcount:
            wordcount[word] = 1
        else:
            wordcount[word] += 1
print(word, wordcount)

Solution

  • As the code stands, you have three obvious errors (although there may be more).

    1. You have a for loop where you change the name of the iterator

      for **filename** in glob.glob(os.path.join(path, '*.txt')):
          ...
          for word in **file**.read.split():
              ...
      
    2. The wordcount dictionary gets re-initialized (and thus erased) in each iteration of your for loop. You can fix this two ways depending on what you are trying to get at:

      a. Move the line wordcount={} to before you start your for loops to prevent clearing out the dictionary after each file. This will give you a total wordcount for all files.

      b. Append wordcount to another dictionary files after each iteration of your loop, that way you have a dictionary where the keys are filenames, and the values are dictionaries containing your wordcounts. This can be a bit confusing, because you now have a dictionary of dictionaries. Referencing individual wordcounts becomes filecounts[filename][word] = count.

    3. Your method of printing dictionaries is incorrect, consider the following instead:

      for word in wordcount:
          print('{word}:\t{count}'.format(word=word, count=wordcount[word]))
      

    I would also suggest using a default dictionary (see Docs, this would eliminate the need to check if a word is in the dictionary, and set it to 1.

    So, in total, I would write it:

    from collections import defaultdict
    import glob
    import os
    
    path = 'C:\Python\Anaconda3'
    filecounts = {}
    
    for filename in glob.glob(os.path.join(path, '*.txt')):
        wordcount = defaultdict(int)
        for word in filename.read().split():
            wordcount[word] += 1
    
        filecounts[filename] = wordcount
    
    for filename in filecounts:
        print('Word count for file \'{file}\''.format(file=filename))
        for word in filecounts[filename]:
            print('\t{word}:\t{count}'.format(word=word, count=filecounts[filename][word]))