Search code examples
pythonlistdictionarynlp

How to use loop to get the word frequency of a list object and store in a dict object?


I have a list called data and a dict object called word_count, before converting the frequency into unique integers, I want to return a dict object word_count (expected format: {'marjori': 1,'splendid':1...}) and then sort the frequency.

data = [['marjori',
 'splendid'],
 ['rivet',
 'perform',
 'farrah',
 'fawcett']]

def build_dict(data, vocab_size = 5000):

    word_count = {}
    for w in data:
        word_count.append(data.count(w)) ????
    #print(word_count)

    # how can I sort the words to make sorted_words[0] is the most frequently appearing word and sorted_words[-1] is the least frequently appearing word.

    sorted_words = ??

I'm new to Python, can someone help me, thanks in advance. (I only want to use numpy library and for loop.)


Solution

  • For each word, you need to create a dict entry if it doesn't exist yet, or add 1 to it's value if it does exist:

     word_count = dict()
            for w in data:
                if word_count.get(w) is not None:
                    word_count[w] += 1
                else:
                    word_count[w] = 1
    

    Then you can sort your dictionary by value:

    word_count = {k: v for k, v in sorted(word_count.items(), key=lambda item: item[1], reverse=True)}