python python-3.x performance ipython luhn

Unexpected performance with Luhn's Algorithm in (I)python

I've implemented two python versions of the checksum part of Luhn's algorithm. The code snippets are nearly identical except for how they calculate the second summation. It differs from the usual implementation, in that it computes the total sum, and then updates the sum with a correction factor (as seen here).

def manual_loop(xs):
    differences = (0, 1, 2, 3, 4, -4, -3, -2, -1, 0)
    total = sum(xs) % 10
    for x in xs[len(xs)%2::2]:
        total += differences[x]
    return total % 10

def builtin_loop(xs):
    differences = (0, 1, 2, 3, 4, -4, -3, -2, -1, 0)
    total = sum(xs) % 10
    total += sum(differences[x] for x in xs[len(xs)%2::2])
    return total % 10

I then timed the code in an Ipython terminal using timeit on three samples.

from random import randint
randoms5 = [randint(0, 9) for _ in range(10**5)]
randoms6 = [randint(0, 9) for _ in range(10**6)]
randoms7 = [randint(0, 9) for _ in range(10**7)]

The results are as follows:

>>> %timeit manual_loop(randoms5)
2.69 ms ± 17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit manual_loop(randoms6)
29.3 ms ± 535 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit manual_loop(randoms7)
311 ms ± 1.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit builtin_loop(randoms5)
3.31 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
>>> %timeit builtin_loop(randoms6)
34.8 ms ± 1.31 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
>>> %timeit builtin_loop(randoms7)
337 ms ± 5.69 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

What gives? I would have expected python's built-in sum to give far better performance than doing the looping myself, especially for lists of this size.

NOTE: I have left out other variants such as doing both summations manually and swapping which summation used the built-in sum as they gave far worse times.

EDIT: Upon request I've rerun the tests with a fixed seed.

import random
random.seed(42)
randoms5 = [random.randint(0, 9) for _ in range(10**5)]
randoms6 = [random.randint(0, 9) for _ in range(10**6)]
randoms7 = [random.randint(0, 9) for _ in range(10**7)]

Results are more expected for randoms5, but remain weird for bigger tests.

>>> %timeit manual_loop(randoms5)
3.36 ms ± 346 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit builtin_loop(randoms5)
3.23 ms ± 76.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

>>> %timeit manual_loop(randoms6)
31.2 ms ± 890 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit builtin_loop(randoms6)
35.4 ms ± 2.27 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

>>> %timeit manual_loop(randoms7)
311 ms ± 5.43 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

>>> %timeit builtin_loop(randoms7)
341 ms ± 8.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution

Your sum call doesn't get to avoid an interpreted loop. sum itself isn't interpreted, but the generator expression it's looping over is interpreted. Generator expressions have higher overhead than regular Python loops, due to all the work of suspending and resuming the generator's stack frame over and over.