CORPUS = [
'this is the first document',
'this is the second document',
'and this is the third document',
'is this the first document ?'
]
doc = CORPUS
dic = {}
for sentence in doc:
k = list(sentence.split())
for term in k:
count_term = k.count(term)
if not dic[term]:
dic[term] = count_term
else:
dic[term] += count_term
print(dic)
I want to count a frequency for the terms in the sentences in CORPUS list, so I tried to make a dictionary and input the count object but KeyError: 'this'
Could you explain why the error happened?
If I understand it correctly, your code can be simplified to:
from collections import Counter
print(Counter(" ".join(CORPUS).split()))
which yields
Counter({'this': 4,
'is': 4,
'the': 4,
'first': 2,
'document': 4,
'second': 1,
'and': 1,
'third': 1,
'?': 1})
So, the idea is to first create one long string which avoids the loop and then use a built-in function to count the occurrences of the individual words.
The reason for the error you get is well explained in the other two answers (I upvoted both of them) :)