Search code examples
pythoncythonranking

Quickly rank variables in python


I was wondering what the fastest way to sort variables is? I have 4 integer variables and I need to rank them quickly. This process needs to run many, many times so it needs to be quick. I tried using a counter and the counter().most_common() function which works well but is slower than just counting with individual variables. Here is an example of what I am running.

A = 15
B = 10
C = 5
D = 10

def get_highest(A,B,C,D):
    count = A
    label = 'A'
    if B >= count:
        count = B
        label = 'B'
    if C >= count:
        count = C
        label = 'C'
    if D >= count:
        count = D
        label = 'D'

    return count, label

highest, label = get_highest(A,B,C,D)
if label == 'A':
    A=0
if label == 'B':
    B=0
if label == 'C':
    C=0
if label == 'D':
    D=0
second_highest, label = get_highest(A,B,C,D)

I continue until I get the ranks of all the variables. I was wondering if there is a faster way to do this? I would also like to implement this in cython so answers that can be accelerated when implemented in cython would be appreciated.


Solution

  • Here's a faster alternative to your function:

    import operator
    
    def get_highest(A,B,C,D):
        return max(zip((A, B, C, D), 'ABCD'), key=operator.itemgetter(0))
    

    However, if your goal, as it appears, is to zero out the maximum-valued variable, you may be better off having the function do even more:

    def max_becomes_zero(A, B, C, D):
        temp = [A, B, C, D]
        maxind, maxval = max(enumerate(temp), key=operator.itemgetter(1))
        maxname = 'ABCD'[maxind]
        temp[maxind] = 0
        return temp, maxval, maxname
    

    to be called as follows:

    (A, B, C, D), highest, label = max_becomes_zero(A, B, C, D)
    

    Added: some may wonder (and did ask in comments) about relative speeds of operator.itemgetter vs a lambda. Answer: don't wonder, measure. That's what the timeit module in Python's standard library is for...:

    $ python -mtimeit -s'a="something"' 'max(enumerate(a), key=lambda x: x[1])'
    1000000 loops, best of 3: 1.56 usec per loop
    $ python -mtimeit -s'a="something"; import operator' 'max(enumerate(a), operator.itemgetter(1))'
    1000000 loops, best of 3: 0.363 usec per loop
    

    As you see, in this particular case (on my Linux workstation, and with Python 2.7.9), the acceleration of the whole operation is impressive -- more than 4 times faster, saving more than a microsecond per repetition.

    More generally, avoiding lambda whenever feasible will make you much happier.

    Note: it's important to time the actual operations -- putting preliminary ones such as the initialization of a and the import in the startup only, i.e in the -s flag for the (recommended) use of timeit from the command line in the python -mtimeit form; I suspect this mistake is what's apparently stopping a commenter from reproducing these results (just a guess as said commenter is not showing us the exact code being timed, of course).