Search code examples
pythonloopsdictionarynlpn-gram

How can I merge values in one dictionary as an item to another dictionary created inside a "for" loop?


I am struggling with lists, loops and dictionaries.

Basically, I have two 'levels' of dictionary. The first has as its key a word, and as the value the number of occurrences of that word within a given sentence. So like this:

wordcount={'sort': 3, 'count': 3, 'wrap': 3, 'coin': 11}

Next, I have a for loop that goes through a list of just these words, and for each one creates a dictionary with a number of additional attributes, namely that word's occurrence according to Google N-grams:

for word in wordlist:

      url =f"https://books.google.com/ngrams/json?content= 
      {word}&year_start=1965&year_end=1975&corpus=26&smoothing=3"
      sleep(1)
      resp = requests.get(url)

            if resp.ok:
                results = json.loads(resp.content)[0]
                results_clean = {key: val for key, val in results.items() if key == "ngram" or key =="timeseries"}
                timeseries = {key: results_clean[key] for key in results_clean.keys() & {'timeseries'}}
                timeseriesvalues= list(timeseries.values())
                timeseriesmean=np.mean(timeseriesvalues)
                ngramsonly = {key: results_clean[key] for key in results_clean.keys() & {'ngram'}}
                ngramsvalues = list(ngramsonly.values())
                results_nouns_final={"word": ngramsvalues, "occurrence_mean": timeseriesmean}

Basically, I want to append to results_noun_final that word's occurrence value from before. However, when I try to do so by adding the word's value from wordcount as a third item to this dictionary (as follows):

results_nouns_final={"word": ngramsvalues, "occurrence_mean": timeseriesmean, "count": wordcount.items()}

It is appending all words' counts, and giving me something like the following:

{'word': ['sort'], 'occurrence_mean': 5.319996372468642e-05, 'count': dict_items([('sort', 3), ('count', 3), ('wrap', 3), ('coin', 11)}
{'word': ['count'], 'occurrence_mean': 4.5438979543294875e-05, 'count': dict_items([('sort', 3), ('count', 3), ('wrap', 3), ('coin', 11)}
...etc.

Could anybody let me know where I am going wrong? My desired output would be something like the following:

{'word': ['sort'], 'occurrence_mean': 5.319996372468642e-05, 'count': 3}
{'word': ['count'], 'occurrence_mean': 4.5438979543294875e-05, 'count': 3}

Solution

  • When you use wordcount.items(), you are getting all the items in the wordcount dictionary. You only want to access the count of word. Try replacing the last line in your loop with:

    results_nouns_final={"word": ngramsvalues, "occurrence_mean": timeseriesmean, "count": wordcount.get(word)}
    

    wordcount.get(word) gives you the count of word from your dictionary. Using .get() returns None if the word is not in wordcount.

    If you're sure every word exists in your dictionary, you could just use wordcount[word].