python arrays python-2.7 time list-comprehension

Comparing list comprehensions and explicit loops (3 array generators faster than 1 for loop)

I was doing my homework and accidentally found a strange inconsistency in the algorithm's speed.

Here are two code versions of the same function but with a difference. In the first version, I used generators three times to filter an array and in the second version, I used a for loop with three conditional statements to do the same filter work.

So, here is the code of the first version:

def kth_order_statistic(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = [x for x in array if x < pivot]
    m = [x for x in array if x == pivot]
    r = [x for x in array if x > pivot]

    if k <= len(l):
        return kth_order_statistic(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic(r, k - len(l) - len(m))
    else:
        return m[0]

and here is the code of the second version:

def kth_order_statistic2(array, k):
    pivot = (array[0] + array[len(array) - 1]) // 2
    l = []
    m = []
    r = []

    for x in array:
        if x < pivot:
            l.append(x)
        elif x > pivot:
            r.append(x)
        else:
            m.append(x)

    if k <= len(l):
        return kth_order_statistic2(l, k)
    elif k > len(l) + len(m):
        return kth_order_statistic2(r, k - len(l) - len(m))
    else:
        return m[0]

The IPython output for the first version is:

In [4]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: order_statisctic(A, k)
   ...:
10 loops, best of 3: 120 ms per loop

And the IPython output for the second version is:

In [5]: %%timeit
   ...: A = range(100000)
   ...: shuffle(A)
   ...: k = randint(1, len(A)-1)
   ...: kth_order_statistic2(A, k)
   ...:
10 loops, best of 3: 169 ms per loop

So why is the first version faster than the second? I also made a third version which uses filter() function instead of an array generator and it was slower than the second version (it got 218 ms per loop)

Solution

Let's define the functions we will need to answer the question and timeit them:

In [18]: def iter():
    l = [x for x in range(100) if x > 10]
   ....:

In [19]: %timeit iter()
100000 loops, best of 3: 7.92 µs per loop

In [20]: def loop():
    l = []
    for x in range(100):
        if x > 10:
            l.append(x)
   ....:

In [21]: %timeit loop()
10000 loops, best of 3: 20 µs per loop

In [22]: def loop_fast():
    l = []
    for x in range(100):
        if x > 10:
            pass
   ....:

In [23]: %timeit loop_fast()
100000 loops, best of 3: 4.69 µs per loop

we can see that the for loops without the append command is as fast as the list comprehension. In fact, if we have a look at the bytecode we can see that in the case of the list comprehension python is able to use a built-in bytecode command called LIST_APPEND instead of:

Load the list: 40 LOAD_FAST
Load the attribute: 43 LOAD_ATTRIBUTE
Call the loaded function: 49 CALL_FUNCTION
Unload the list(?): 52 POP_TOP

As you can see from the output below the previous bytecode are missing with list comprehension and with the "loop_fast" function. Comparing the timeit of the three function is clear that those are responsible for the different timing of the three methods.

In [27]: dis.dis(iter)
  2          0 BUILD_LIST             0
             3 LOAD_GLOBAL            0 (range)
             6 LOAD_CONST             1 (1)
             9 LOAD_CONST             2 (100)
            12 CALL_FUNCTION          2
            15 GET_ITER
       >>   16 FOR_ITER              24 (to 43)
            19 STORE_FAST             0 (x)
            22 LOAD_FAST              0 (x)
            25 LOAD_CONST             2 (100)
            28 COMPARE_OP             4 (>)
            31 POP_JUMP_IF_FALSE     16
            34 LOAD_FAST              0 (x)
            37 LIST_APPEND            2
            40 JUMP_ABSOLUTE         16
       >>   43 STORE_FAST             1 (l)
            46 LOAD_CONST             0 (None)
            49 RETURN_VALUE

In [28]: dis.dis(loop)
  2          0 BUILD_LIST             0
             3 STORE_FAST             0 (1)

  3          6 SETUP_LOOP            51 (to 60)
             9 LOAD_GLOBAL            0 (range)
            12 LOAD_CONST             1 (1)
            15 LOAD_CONST             2 (100)
            18 CALL_FUNCTION          2
            21 GET_ITER
       >>   22 FOR_ITER              34 (to 59)
            25 STORE_FAST             1 (x)

  4         28 LOAD_FAST              1 (x)
            31 LOAD_CONST             3 (10)
            34 COMPARE_OP             4 (>)
            37 POP_JUMP_IF_FALSE     22

  5         40 LOAD_FAST              0 (l)
            43 LOAD_ATTR              1 (append)
            46 LOAD_FAST              1 (x)
            49 CALL_FUNCTION          1
            52 POP_TOP
            53 JUMP_ABSOLUTE         22
            56 JUMP_ABSOLUTE         22
       >>   59 POP_BLOCK
       >>   60 LOAD_CONST             0 (None)
            63 RETURN_VALUE

In [29]: dis.dis(loop_fast)
  2          0 BUILD_LIST             0
             3 STORE_FAST             0 (1)

  3          6 SETUP_LOOP            38 (to 47)
             9 LOAD_GLOBAL            0 (range)
            12 LOAD_CONST             1 (1)
            15 LOAD_CONST             2 (100)
            18 CALL_FUNCTION          2
            21 GET_ITER
       >>   22 FOR_ITER              21 (to 46)
            25 STORE_FAST             1 (x)

  4         28 LOAD_FAST              1 (x)
            31 LOAD_CONST             3 (10)
            34 COMPARE_OP             4 (>)
            37 POP_JUMP_IF_FALSE     22

  5         40 JUMP_ABSOLUTE         22
            43 JUMP_ABSOLUTE         22
       >>   46 POP_BLOCK
       >>   47 LOAD_CONST             0 (None)
            50 RETURN_VALUE