Search code examples
pythonlistcollectionscounter

How to find the most common name in 2 related lists


I would like to seek help from the community.. I have 2 related lists here:

names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]

The votes list is a result of a facial recognition algorithm matching from the corresponding names list. Then I shall link each True vote to the corresponding name, and find the most frequently occurred name to be the final 'winner'.

I have tried 2 ways:

characters = {}
for name, vote in list(zip(names, votes)):
    if vote == True:
        characters[name] = characters.get(name, 0) + 1
#print(characters)
print(max(characters, key=characters.get))

The output is 'owen_grady'

from collections import Counter

characters = [name for name, vote in list(zip(names, votes)) if vote == True]
#print(characters)
print(Counter(characters).most_common()[0][0])

The output is also 'owen_grady'. Which way is more efficient: Dictionary? or List Comprehension with Counter?

My ultimate question: is there another way (the most efficient) to get the result? I would like the output to be just 'owen_grady'


Solution

  • You can use itertools.compress() to filter all false entries. Option with Counter should be most efficient, just use n argument in .most_common() to let it return a single pair.

    Code:

    from itertools import compress
    from collections import Counter
    
    names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
    votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]
    
    most_common = Counter(compress(names, votes)).most_common(1)[0][0]
    # Or with some syntax sugar:
    # [(most_common, _)] = Counter(compress(names, votes)).most_common(1)
    

    Upd. I've made some benchmarks and it seems like for this particular case slightly optimized first method demonstrates better performance:

    from itertools import compress
    
    names = ['alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'alan_grant', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'claire_dearing', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ellie_sattler', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'ian_malcolm', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'john_hammond', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady', 'owen_grady']
    votes = [True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, True, True, True]
    
    characters = list(compress(names, votes))
    most_common = max(set(characters), key=characters.count)