Search code examples
pythonlistdata-structuresword-frequency

Can someone explain this word frequency count, please?


I'm taking the MIT DS&A algorithm course and on the document distance problem, we have to parse a file into a list of words, then count the frequency of each word in the file. I have a hard time comprehending the following function:

def count_frequency(word_list):
    """
    Return a list giving pairs of form: (word,frequency)
    """
    L = []
    for new_word in word_list:
        for entry in L:
            if new_word == entry[0]:
                entry[1] = entry[1] + 1
                break
        else:
            L.append([new_word,1])
    return L
  1. Why do we compare new_word to entry[0]? a. What if L is empty? What do we compare new_word to? b. Why do we compare new_word to entry[0] specifically? Why don't we do something like if new_word in L c. Why do we need to use break?
  2. Why is the else block 1 tab to the right of the earlier if block? When I tried to indent the else block, an indentation error would show up.

Thank you for your help!


Solution

  • The list L contains two-item entries due to L.append([new_word,1]). If L is empty the for would not be entered, so there is no problem with entry[0].

    entry[0] is a word and entry[1] is a count. You can't say if new_word in L because it is not just a list of strings.

    break stops the for once a word is found.

    for/else is a thing in Python. The else runs if the for completes without interruption (a break in this case). If new_word isn't in L, the for won't break and the new word and a count of 1 is added to L.

    FYI, the built-in collections.Counter() would return similar results.