Search code examples
pythondictionarykeymode

Finding modes for multiple dictionary keys


I currently have a Python dictionary with keys assigned to multiple values (which have come from a CSV), in a format similar to:

{
'hours': ['4', '2.4', '5.8', '2.4', '7'],
'name': ['Adam', 'Bob', 'Adam', 'John', 'Harry'],
'salary': ['55000', '30000', '55000', '30000', '80000']
}

(The actual dictionary is significantly larger in both keys and values.)

I am looking to find the mode* for each set of values, with the stipulation that sets where all values occur only once do not need a mode. However, I'm not sure how to go about this (and I can't find any other examples similar to this). I am also concerned about the different (implied) data types for each set of values (e.g. 'hours' values are floats, 'name' values are strings, 'salary' values are integers), though I have a rudimentary conversion function included but not used yet.

import csv

f = 'blah.csv'

# Conducts type conversion
def conversion(value):
    try:
        value = float(value)
    except ValueError:
        pass
    return value

reader = csv.DictReader(open(f))

# Places csv into a dictionary
csv_dict = {}
for row in reader:
    for column, value in row.iteritems():
        csv_dict.setdefault(column, []).append(value.strip())

*I'm wanting to attempt other types of calculations as well, such as averages and quartiles- which is why I'm concerned about data types- but I'd mostly like assistance with modes for now.

EDIT: the input CSV file can change; I'm unsure if this has any effect on potential solutions.


Solution

  • Ignoring all the csv file stuff which seems tangential to your question, lets say you have a list salary. You can use the Counter class from collections to count the unique list elements.

    From that you have a number of different options about how to get from a Counter to your mode.

    For example:

    from collections import Counter
    
    salary = ['55000', '30000', '55000', '30000', '80000']
    
    counter = Counter(salary)
    
    # This returns all unique list elements and their count, sorted by count, descending
    mc = counter.most_common()
    print(mc)
    
    # This returns the unique list elements and their count, where their count equals
    #   the count of the most common list element.
    gmc = [(k,c) for (k,c) in mc if c == mc[0][1]]
    print(gmc)
    
    # If you just want an arbitrary (list element, count) pair that has the most occurences
    amc = counter.most_common()[0]
    print(amc)
    

    For the salary list in the code, this outputs:

    [('55000', 2), ('30000', 2), ('80000', 1)]  # mc
    [('55000', 2), ('30000', 2)]                # gmc
    ('55000', 2)                                # amc
    

    Of course, for your case you'd probably use Counter(csv_dict["salary"]) instead of Counter(salary).