Search code examples
pythondictionarynlpword2vec

converting from a list to dictionary for a word2vec model


i have a huge corpus of data in my my text file that i want to train for skip gram model. i have split the data from file into list now i want to count the words with their number of occurrence and make a dictionary ,give the word as key to the dictionary and frequency as the value.here is a snippet of my code

with open("enwik8","r") as data:
    words=data.read().split()   

vocabulary_size = 5000


  count = [['UNK', -1]]
  count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
count.extend(collections.Counter(words).most_common(vocabulary_size - 1))

i have succesfully made a list with the words and their frequency upto first most common 50000 words,now i need to feed them to dictionary,key as a word and value as freq.

dictionary = dict()
for word, _ in count:

can anyone help me through??


Solution

  • Assuming you have already a list of words, here is how you draw dictionary out of it as per your need:

    word_dict = dict()
    for word_count in words:
        if word_count[0] not in word_dict:
            word_dict[word_count[0]] = word_count[1]
    

    your list contains tuples, so word_dict[word_count[0]], so I am placing first item of tuple that is word as a key in dictionary and second item word_count[1] in tuple which is count as value to that key