Search code examples
pythondictionarycounternested-lists

Count frequency of words inside a list in a dictionary


I have a list of common keywords:

common_keywords = ['dog', 'person', 'cat']

And a list of dictionaries, containing keywords and sometimes the common_keywords listed above:

people = [{'name':'Bob', 'keywords': ['dog', 'dog', 'car', 'trampoline']},
          {'name':'Kate', 'keywords': ['cat', 'jog', 'tree', 'flower']},
           {'name':'Sasha', 'keywords': ['cooking', 'stove', 'person', 'cat']}]

I would like to count the frequency of the common_keywords for each person, so the desired output would look something like:

counts = [{'name': 'Bob', 'counts': [{'dog': 2}]}, 
          {'name': 'Kate', 'counts': [{'cat': 1}]}, 
          {'name': 'Sasha', 'counts': [{'person':1}, {'cat': 1}]]

I am able to use dict(Counter()) to count the keywords and filter them if they appear in the common_keywords but I am struggling with linking these counts back to the original name as shown in the desired output: counts.

Current code (I think I am slowly getting there):

freq_dict = {}
for p in people:
    name = p['name']
    for c in p['keywords']:
        if c not in freq_dict:
            freq_dict[name] = {c: 1}
        else: 
            if c not in freq_dict[name]:
                freq_dict[c] = 1
            else:
                freq_dict[c] +=1

Solution

  • You can use a list-comprehension along with collections.Counter which does exactly what you want with the nested list. -

    from collections import Counter
    
    [{'name':i.get('name'),
      'keywords':[dict(Counter([j for j in i.get('keywords') 
                                if j in common_keywords]))]} for i in people]
    
    [{'name': 'Bob', 'keywords': [{'dog': 2}]},
     {'name': 'Kate', 'keywords': [{'cat': 1}]},
     {'name': 'Sasha', 'keywords': [{'person': 1, 'cat': 1}]}]
    

    1. First, with the list comprehension you want to reconstruct the original list of dicts with keys separately defined along with i.get('key'). This will let to work with the nested list value for keywords.
    2. Iterate over the list and filter only the ones in common_keywords
    3. Pass this list into collections.Counter to get your dict
    4. Return it as a list with a single dict inside as you expect it to be