I have a list called data
and a dict object called word_count
, before converting the frequency into unique integers, I want to return a dict object word_count
(expected format: {'marjori': 1,'splendid':1...}
) and then sort the frequency.
data = [['marjori',
'splendid'],
['rivet',
'perform',
'farrah',
'fawcett']]
def build_dict(data, vocab_size = 5000):
word_count = {}
for w in data:
word_count.append(data.count(w)) ????
#print(word_count)
# how can I sort the words to make sorted_words[0] is the most frequently appearing word and sorted_words[-1] is the least frequently appearing word.
sorted_words = ??
I'm new to Python, can someone help me, thanks in advance. (I only want to use numpy
library and for loop.)
For each word, you need to create a dict entry if it doesn't exist yet, or add 1 to it's value if it does exist:
word_count = dict()
for w in data:
if word_count.get(w) is not None:
word_count[w] += 1
else:
word_count[w] = 1
Then you can sort your dictionary by value:
word_count = {k: v for k, v in sorted(word_count.items(), key=lambda item: item[1], reverse=True)}