Search code examples
pythonlistpositionrankingsorting

Retrieve the ranking of elements in various list to compute the weighted average of their ranking scores Python


I have two sorted dictionaries, i.e. they are now represented as lists. I would like to retrieve the ranking position of each element in each of the lists and store it in a variable so that ultimately I can compute the weighted average of the ranking score of each elements in both lists. Here is an example.

dict1 = {'class1': 15.17, 'class2': 15.95, 'class3': 15.95}

sorted_dict1 = [('class1', 15.17), ('class2', 15.95), ('class3', 15.95)]

sorted_dict2 = [('class2', 9.10), ('class3', 9.22), ('class1', 10.60)]

So far I can retrieve the ranking position of each element in the list and print the ranking but when I try to compute the weighted average of the ranking score i.e. [(w1*a + w2*b)/(w1+w2)], where "a" is the ranking position in sorted_dict1 and "b" is the ranking position in sorted_dict2, the numbers that I get are not the correct weighted average numbers.

Attempted various things, here is one:

for idx, val in list(enumerate(sorted_dict1, 1)):
    for idx1, val1 in list(enumerate(sorted_dict2, 1)):
         position_dict1 = idx
         position_dict2 = idx1
    weighted_average = float((0.50*position_dict1 + 0.25*position_dict2))/0.75     
    print weighted_average

I also didn't consider what should happen if two classes rank the same in a list. I would be grateful to get any hints/help on that too.

I thought that I might need to create a function to solve this, but I didn't go far with that either.

Any help as well as accompanied comments for explaining the code would be great.

So I would like to calculate the weighted average of the ranking position of the elements in the lists. e.g. the weighted average for :

class1: weighted_average = ((0.50 * 1) + (0.25 * 3))/0.75 = 1.5

class2: then the weighted_average = ((0.50 *2)+(0.25*1))/0.75 = 1.6666..7

Thank you!


Solution

  • I've taken the easy route and given classes of equal score the next integer rank, so class3 and class2 both have rank 2 in sorted_dict1

    #!/usr/bin/env python
    
    #Get the ranks for a list of (class, score) tuples sorted by score
    #and return them in a dict
    def get_ranks(sd):
        #The first class in the list has rank 1
        k, val = sd[0]
        r = 1
        rank = {k: r}
    
        for k, v in sd[1:]:
            #Only update the rank number if this value is 
            #greater than the previous
            if v > val:
                val = v
                r += 1
            rank[k] = r
        return rank
    
    def weighted_mean(a, b):
        return (0.50*a + 0.25*b) / 0.75
    
    sorted_dict1 = [('class1', 15.17), ('class2', 15.95), ('class3', 15.95)]
    sorted_dict2 = [('class2', 9.10), ('class3', 9.22), ('class1', 10.60)]
    
    print sorted_dict1
    print sorted_dict2
    
    ranks1 = get_ranks(sorted_dict1)
    ranks2 = get_ranks(sorted_dict2)
    
    print ranks1
    print ranks2
    
    keys = sorted(k for k,v in sorted_dict1)
    
    print [(k, weighted_mean(ranks1[k], ranks2[k])) for k in keys]
    

    output

    [('class1', 15.17), ('class2', 15.949999999999999), ('class3', 15.949999999999999)]
    [('class2', 9.0999999999999996), ('class3', 9.2200000000000006), ('class1', 10.6)]
    {'class2': 2, 'class3': 2, 'class1': 1}
    {'class2': 1, 'class3': 2, 'class1': 3}
    [('class1', 1.6666666666666667), ('class2', 1.6666666666666667), ('class3', 2.0)]
    

    In the comments I mentioned that there's a nice way to create a weighted_mean() function with custom weights. Of course, we could just pass the weights as additional arguments to weighted_mean(), but that makes the call to weighted_mean() more cluttered than it needs to be, making the program harder to read.

    The trick is to use a function that takes the custom weights as arguments and returns the desired function. Technically, such a function-making function is called a closure.

    Here's a short demo of how to do that.

    #!/usr/bin/env python
    
    #Create a weighted mean function with weights w1 & w2
    def make_weighted_mean(w1, w2):
        wt = float(w1 + w2)
        def wm(a, b):
            return (w1 * a + w2 * b) / wt
        return wm
    
    #Make the weighted mean function
    weighted_mean = make_weighted_mean(1, 2)
    
    #Test
    print weighted_mean(6, 3)
    print weighted_mean(3, 9)
    

    output

    4.0
    7.0
    

    Here's an updated version of the first program above that handles an arbitrary number of sorted_dict lists. It uses the original get_ranks() function, but it uses a slightly more complex closure than the above example to do the weighted means on a list (or tuple) of data.

    #!/usr/bin/env python
    
    ''' Weighted means of ranks
    
        From https://stackoverflow.com/q/29413531/4014959
    
        Written by PM 2Ring 2015.04.03
    '''
    
    from pprint import pprint
    
    #Create a weighted mean function with weights from list/tuple weights
    def make_weighted_mean(weights):
        wt = float(sum(weights))
        #A function that calculates the weighted mean of values in seq 
        #weighted by the weights passed to make_weighted_mean()
        def wm(seq):
            return sum(w * v for w, v in zip(weights, seq)) / wt
        return wm
    
    
    #Get the ranks for a list of (class, score) tuples sorted by score
    #and return them in a dict
    def get_ranks(sd):
        #The first class in the list has rank 1
        k, val = sd[0]
        r = 1
        rank = {k: r}
    
        for k, v in sd[1:]:
            #Only update the rank number if this value is 
            #greater than the previous
            if v > val:
                val = v
                r += 1
            rank[k] = r
        return rank
    
    
    #Make the weighted mean function
    weights = [0.50, 0.25]
    weighted_mean = make_weighted_mean(weights)
    
    #Some test data
    sorted_dicts = [
        [('class1', 15.17), ('class2', 15.95), ('class3', 15.95), ('class4', 16.0)],
        [('class2', 9.10), ('class3', 9.22), ('class1', 10.60), ('class4', 11.0)]
    ]
    print 'Sorted dicts:'
    pprint(sorted_dicts, indent=4)
    
    all_ranks = [get_ranks(sd) for sd in sorted_dicts]
    print '\nAll ranks:'
    pprint(all_ranks, indent=4)
    
    #Get a sorted list of the keys
    keys = sorted(k for k,v in sorted_dicts[0])
    #print '\nKeys:', keys
    
    means = [(k, weighted_mean([ranks[k] for ranks in all_ranks])) for k in keys]
    print '\nWeighted means:'
    pprint(means, indent=4)
    

    output

    Sorted dicts:
    [   [   ('class1', 15.17),
            ('class2', 15.949999999999999),
            ('class3', 15.949999999999999),
            ('class4', 16.0)],
        [   ('class2', 9.0999999999999996),
            ('class3', 9.2200000000000006),
            ('class1', 10.6),
            ('class4', 11.0)]]
    
    All ranks:
    [   {   'class1': 1, 'class2': 2, 'class3': 2, 'class4': 3},
        {   'class1': 3, 'class2': 1, 'class3': 2, 'class4': 4}]
    
    Weighted means:
    [   ('class1', 1.6666666666666667),
        ('class2', 1.6666666666666667),
        ('class3', 2.0),
        ('class4', 3.3333333333333335)]
    

    And here's an alternate version of get_ranks() that skips rank numbers if two or more classes rank the same in a list

    def get_ranks(sd):
        #The first class in the list has rank 1
        k, val = sd[0]
        r = 1
        rank = {k: r}
        #The step size from one rank to the next. Normally 
        #delta is 1, but it's increased if there are ties.
        delta = 1
    
        for k, v in sd[1:]:
            #Update the rank number if this value is 
            #greater than the previous. 
            if v > val:
                val = v
                r += delta
                delta = 1
            #Otherwise, update delta
            else:
                delta += 1
            rank[k] = r
        return rank
    

    Here's the output of the program using that alternate version of get_ranks():

    Sorted dicts:
    [   [   ('class1', 15.17),
            ('class2', 15.949999999999999),
            ('class3', 15.949999999999999),
            ('class4', 16.0)],
        [   ('class2', 9.0999999999999996),
            ('class3', 9.2200000000000006),
            ('class1', 10.6),
            ('class4', 11.0)]]
    
    All ranks:
    [   {   'class1': 1, 'class2': 2, 'class3': 2, 'class4': 4},
        {   'class1': 3, 'class2': 1, 'class3': 2, 'class4': 4}]
    
    Weighted means:
    [   ('class1', 1.6666666666666667),
        ('class2', 1.6666666666666667),
        ('class3', 2.0),
        ('class4', 4.0)]