Search code examples
pythondictionary-comprehension

Python loop/comprehension for a nested word count


I'm working on analyzing some user data, and I've got a list of (preprocessed to lowercase) usernames, something like this: name_list = ['joebob', 'sallycat', 'bigbenny', 'davethepirate', 'nightninja', ...(many more)] and a dictionary of comparisons I'd like to run on those names to see how often certain words show up compared to certain others. For example...

comparisons = {"Pirates vs Ninjas": ["pirate", "ninja"],
               "Cats vs Dogs": ["cat", "dog"]}

I'm trying to get a loop/comprehension with output that would look like

{"Pirates vs Ninjas": {"pirate": 224, "ninja": 342},
 "Cats vs Dogs": {"cat": 430, "dog": 391}}

(With the numbers above just being examples of end result word counts)

I know all the individual components necessary to make it work (dictionary comprehensions and dict.get). What is the right way to put it all together?

Edit for clarification: I want to see how many usernames contain the word "cat", and record that next to a number that contain the word "dog". The results will be logged in a dict with a key "Cats vs Dogs". I would then proceed to do the same with the next comparison, "Pirates vs Ninjas".


Solution

  • from collections import Counter
    
    c = Counter(user_names)
    
    result = {category: {entry: c[entry] for entry in entries}
              for category, entries in comparisons.items()}
    

    First running a Counter over the list to get a username -> count mapping and then using a dict & list comprehension through the comparisons. The counter gives 0 if entry doesn't exist in it.

    Above, for example:

    • category == "Pirates vs Ninjas"
    • entry == "pirate"
    • entries == ["pirate", "ninja"]

    Sample data:

    user_names = ["pirate", "dog", "this", "ninja", "that", "cat", "cat", "ninja", "other", "cat"]
    
    c = Counter(user_names)
    
    result = {category: {entry: c[entry] for entry in entries}
              for category, entries in comparisons.items()}
    

    then

    >>> result
    
    {"Pirates vs Ninjas": {"pirate": 1, "ninja": 2}, "Cats vs Dogs": {"cat": 3, "dog": 1}}
    

    If looking to allow for case-insensitive and partial matches, we won't use Counter but sum:

    result = {category: {entry: sum(entry in name for name in user_names) 
                                    for entry in map(str.lower, entries)}
              for category, entries in comparisons.items()}
    

    where we first map the entries to lower case prior to searching and we not only count exact matches but count "contains" type matches via in operator and sum.