Search code examples
pythonsortingkey

python list sort key function choice


I have a list of objects and would like to sort them according to the return value of an instance function. There are two ways to do them

from operator import methodcaller

l.sort(key=lambda x:x.f())
l.sort(key=methodcaller('f'))

Is one way betther than the other? Or it's just a personal preference?


Solution

  • methodcaller('f') is faster, because it can do both the attribute lookup and the method call in C code.

    The lambda adds the following extra overhead:

    • Calling the lambda has to step out of the sort() C loop back into Python code. This requires a new frame object with associated data.

    • Looking up the method attribute is a Python opcode with more overhead than the direct equivalent in C.

    • Calling the method from a Python frame next has to push that frame on the Python call stack again. C code has a stack too, but this is far lighter.

    • Returning from the called method goes back to the Python frame, popping that from the stack, and after which the lambda returns, causing the function frame to be destroyed again (which is more work still).

    You can measure the difference:

    >>> from timeit import timeit
    >>> timeit('m("")', 'm = lambda s: s.lower()', number=10**7)
    1.2575681940070353
    >>> timeit('m("")', 'from operator import methodcaller; m = methodcaller("lower")', number=10**7)
    1.061251598992385
    

    So on 7 million calls to str.lower() on an empty string, a methodcaller() is about 16% faster.

    Now, if all your data is of the exact same type, where object.f would always bind to the same method, then you can just use the unbound method:

    l.sort(key=SharedType.f)
    

    That saves you having to look it up on each of the instances.