Search code examples
pythonarrayspython-2.7timelist-comprehension

Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)


I was doing my homework and accidentally found a strange inconsistency in the algorithm's speed.

Here are two code versions of the same function but with a difference. In the first version, I used generators three times to filter an array and in the second version, I used a for loop with three conditional statements to do the same filter work.

So, here is the code of the first version:

def kth_order_statistic(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = [x for x in array if x < pivot]
    m = [x for x in array if x == pivot]
    r = [x for x in array if x > pivot]

    if k <= len(l):
        return kth_order_statistic(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic(r, k - len(l) - len(m))
    else:
        return m[0]

and here is the code of the second version:

def kth_order_statistic2(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = []
    m = []
    r = []

    for x in array:
        if x < pivot:
            l.append(x)
        elif x > pivot:
            r.append(x)
        else:
            m.append(x)

    if k <= len(l):
        return kth_order_statistic2(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic2(r, k - len(l) - len(m))
    else:
        return m[0]

The IPython output for the first version is:

In [4]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: order_statisctic(A, k)
   ...:
10 loops, best of 3: 120 ms per loop

And the IPython output for the second version is:

In [5]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: kth_order_statistic2(A, k)
   ...:
10 loops, best of 3: 169 ms per loop

So why is the first version faster than the second? I also made a third version which uses filter() function instead of an array generator and it was slower than the second version (it got 218 ms per loop)


Solution

  • Let's define the functions we will need to answer the question and timeit them:

    In [18]: def iter():
        l = [x for x in range(100) if x > 10]
       ....:
    
    In [19]: %timeit iter()
    100000 loops, best of 3: 7.92 µs per loop
    
    In [20]: def loop():
        l = []
        for x in range(100):
            if x > 10:
                l.append(x)
       ....:
    
    In [21]: %timeit loop()
    10000 loops, best of 3: 20 µs per loop
    
    In [22]: def loop_fast():
        l = []
        for x in range(100):
            if x > 10:
                pass
       ....:
    
    In [23]: %timeit loop_fast()
    100000 loops, best of 3: 4.69 µs per loop
    

    we can see that the for loops without the append command is as fast as the list comprehension. In fact, if we have a look at the bytecode we can see that in the case of the list comprehension python is able to use a built-in bytecode command called LIST_APPEND instead of:

    • Load the list: 40 LOAD_FAST
    • Load the attribute: 43 LOAD_ATTRIBUTE
    • Call the loaded function: 49 CALL_FUNCTION
    • Unload the list(?): 52 POP_TOP

    As you can see from the output below the previous bytecode are missing with list comprehension and with the "loop_fast" function. Comparing the timeit of the three function is clear that those are responsible for the different timing of the three methods.

    In [27]: dis.dis(iter)
      2          0 BUILD_LIST             0
                 3 LOAD_GLOBAL            0 (range)
                 6 LOAD_CONST             1 (1)
                 9 LOAD_CONST             2 (100)
                12 CALL_FUNCTION          2
                15 GET_ITER
           >>   16 FOR_ITER              24 (to 43)
                19 STORE_FAST             0 (x)
                22 LOAD_FAST              0 (x)
                25 LOAD_CONST             2 (100)
                28 COMPARE_OP             4 (>)
                31 POP_JUMP_IF_FALSE     16
                34 LOAD_FAST              0 (x)
                37 LIST_APPEND            2
                40 JUMP_ABSOLUTE         16
           >>   43 STORE_FAST             1 (l)
                46 LOAD_CONST             0 (None)
                49 RETURN_VALUE
    
    In [28]: dis.dis(loop)
      2          0 BUILD_LIST             0
                 3 STORE_FAST             0 (1)
    
      3          6 SETUP_LOOP            51 (to 60)
                 9 LOAD_GLOBAL            0 (range)
                12 LOAD_CONST             1 (1)
                15 LOAD_CONST             2 (100)
                18 CALL_FUNCTION          2
                21 GET_ITER
           >>   22 FOR_ITER              34 (to 59)
                25 STORE_FAST             1 (x)
    
      4         28 LOAD_FAST              1 (x)
                31 LOAD_CONST             3 (10)
                34 COMPARE_OP             4 (>)
                37 POP_JUMP_IF_FALSE     22
    
      5         40 LOAD_FAST              0 (l)
                43 LOAD_ATTR              1 (append)
                46 LOAD_FAST              1 (x)
                49 CALL_FUNCTION          1
                52 POP_TOP
                53 JUMP_ABSOLUTE         22
                56 JUMP_ABSOLUTE         22
           >>   59 POP_BLOCK
           >>   60 LOAD_CONST             0 (None)
                63 RETURN_VALUE
    
    In [29]: dis.dis(loop_fast)
      2          0 BUILD_LIST             0
                 3 STORE_FAST             0 (1)
    
      3          6 SETUP_LOOP            38 (to 47)
                 9 LOAD_GLOBAL            0 (range)
                12 LOAD_CONST             1 (1)
                15 LOAD_CONST             2 (100)
                18 CALL_FUNCTION          2
                21 GET_ITER
           >>   22 FOR_ITER              21 (to 46)
                25 STORE_FAST             1 (x)
    
      4         28 LOAD_FAST              1 (x)
                31 LOAD_CONST             3 (10)
                34 COMPARE_OP             4 (>)
                37 POP_JUMP_IF_FALSE     22
    
      5         40 JUMP_ABSOLUTE         22
                43 JUMP_ABSOLUTE         22
           >>   46 POP_BLOCK
           >>   47 LOAD_CONST             0 (None)
                50 RETURN_VALUE