Search code examples
pythontext-filesword-frequency

Loop through multiple txt files and count frequencies of chosen word in Python


I have a exercise problem where I am asked to write a function that loops through 50 text files and counts the frequency of a chosen word in each text file. My code at the moment looks like this:

def count(term):
    frequencies = 0
    
    work_dir = "C:/my_work_directory"
    for i in range(1, 51):
        name = "chapter-{i}.txt".format(i=i)
        path = os.path.join(work_dir, name)
        with io.open(path, "r") as fd:
            content = fd.read()
    
        chapter = io.StringIO(content)
        line = chapter.readline()
        print(chapter)
        while line:
            lower = line.lower()
            cleaned = re.sub('[^a-z ]','', lower)
            words = cleaned.strip().split(' ')
            for word in words:
                if word == term:
                    frequencies += 1
            line = chapter.readline()
        
        print(frequencies)

The output I want is that, if I enter count("Man"), I get 50 different frequencies of how often the word "Man" came up in each text file. However, all I am getting at the moment is 50 zeros. I am fairly sure this is because I have initialised the variable 'frequencies' at 0 and then haven't done anything to it. Can anyone help me fix this issue or tell me where I am going wrong? Any help would be greatly appreciated, thank you.


Solution

  • Well, your 'Man' has a capital letter, and all your words are lower case. So the first thing would be to call the lower() function on the term variable. The second thing that is wrong, and you would only notice it later, is that you're keeping a running count, instead of a per-file count. So move the initialization of the frequency variable into the for loop. So it should look something like this.

    def count(term):
        term = term.lower()
        
        work_dir = "C:/my_work_directory"
        for i in range(1, 51):
            frequencies = 0
    
            name = "chapter-{i}.txt".format(i=i)
            path = os.path.join(work_dir, name)
            with io.open(path, "r") as fd:
                content = fd.read()
        
            chapter = io.StringIO(content)
            line = chapter.readline()
            print(chapter)
            while line:
                lower = line.lower()
                cleaned = re.sub('[^a-z ]','', lower)
                words = cleaned.strip().split(' ')
                for word in words:
                    if word == term:
                        frequencies += 1
                line = chapter.readline()
            
            print(frequencies)