I'm working on analyzing some user data, and I've got a list of (preprocessed to lowercase) usernames, something like this:
name_list = ['joebob', 'sallycat', 'bigbenny', 'davethepirate', 'nightninja', ...(many more)]
and a dictionary of comparisons I'd like to run on those names to see how often certain words show up compared to certain others. For example...
comparisons = {"Pirates vs Ninjas": ["pirate", "ninja"],
"Cats vs Dogs": ["cat", "dog"]}
I'm trying to get a loop/comprehension with output that would look like
{"Pirates vs Ninjas": {"pirate": 224, "ninja": 342},
"Cats vs Dogs": {"cat": 430, "dog": 391}}
(With the numbers above just being examples of end result word counts)
I know all the individual components necessary to make it work (dictionary comprehensions and dict.get
). What is the right way to put it all together?
Edit for clarification: I want to see how many usernames contain the word "cat", and record that next to a number that contain the word "dog". The results will be logged in a dict with a key "Cats vs Dogs". I would then proceed to do the same with the next comparison, "Pirates vs Ninjas".
from collections import Counter
c = Counter(user_names)
result = {category: {entry: c[entry] for entry in entries}
for category, entries in comparisons.items()}
First running a Counter
over the list to get a username -> count mapping and then using a dict & list comprehension through the comparisons
. The counter gives 0 if entry
doesn't exist in it.
Above, for example:
category == "Pirates vs Ninjas"
entry == "pirate"
entries == ["pirate", "ninja"]
Sample data:
user_names = ["pirate", "dog", "this", "ninja", "that", "cat", "cat", "ninja", "other", "cat"]
c = Counter(user_names)
result = {category: {entry: c[entry] for entry in entries}
for category, entries in comparisons.items()}
then
>>> result
{"Pirates vs Ninjas": {"pirate": 1, "ninja": 2}, "Cats vs Dogs": {"cat": 3, "dog": 1}}
If looking to allow for case-insensitive and partial matches, we won't use Counter
but sum
:
result = {category: {entry: sum(entry in name for name in user_names)
for entry in map(str.lower, entries)}
for category, entries in comparisons.items()}
where we first map the entries
to lower case prior to searching and we not only count exact matches but count "contains" type matches via in
operator and sum
.