I'm taking the MIT DS&A algorithm course and on the document distance problem, we have to parse a file into a list of words, then count the frequency of each word in the file. I have a hard time comprehending the following function:
def count_frequency(word_list):
"""
Return a list giving pairs of form: (word,frequency)
"""
L = []
for new_word in word_list:
for entry in L:
if new_word == entry[0]:
entry[1] = entry[1] + 1
break
else:
L.append([new_word,1])
return L
if new_word in L
c. Why do we need to use break
?else
block 1 tab to the right of the earlier if
block? When I tried to indent the else
block, an indentation error would show up.Thank you for your help!
The list L
contains two-item entries due to L.append([new_word,1])
. If L
is empty the for
would not be entered, so there is no problem with entry[0]
.
entry[0]
is a word and entry[1]
is a count. You can't say if new_word in L
because it is not just a list of strings.
break
stops the for
once a word is found.
for/else
is a thing in Python. The else
runs if the for
completes without interruption (a break
in this case). If new_word
isn't in L
, the for
won't break and the new word and a count of 1 is added to L
.
FYI, the built-in collections.Counter()
would return similar results.