python performance-testing python-3.6 execution-time

Python: performance of finding divisors

Benchmarking these functions:

def divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            yield divisor
            yield number / divisor

    if square_root ** 2 == number:
        yield square_root

def number_of_divisors_optimized(number):
    count = 0
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            count += 2

    if square_root ** 2 == number:
        count += 1

    return count

You can see that the basic structure is identical in both.

Benchmark code:

number = 9999999
for i in range(10):
    print(f"iteration {i}:")
    start = time.time()
    result = list(utils.divisors_optimized(number))
    end = time.time()
    print(f'len(divisors_optimized) took {end - start} seconds and found {len(result)} divisors.')

    start = time.time()
    result = utils.number_of_divisors_optimized(number)
    end = time.time()
    print(f'number_of_divisors_optimized took {end - start} seconds and found {result} divisors.')

    print()

Output:

iteration 0:
len(divisors_optimized) took 0.00019598007202148438 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001919269561767578 seconds and found 12 divisors.

iteration 1:
len(divisors_optimized) took 0.00019121170043945312 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020599365234375 seconds and found 12 divisors.

iteration 2:
len(divisors_optimized) took 0.000179290771484375 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00019049644470214844 seconds and found 12 divisors.

iteration 3:
len(divisors_optimized) took 0.00019025802612304688 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 4:
len(divisors_optimized) took 0.0001785755157470703 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 5:
len(divisors_optimized) took 0.00022721290588378906 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020170211791992188 seconds and found 12 divisors.

iteration 6:
len(divisors_optimized) took 0.0001919269561767578 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00018930435180664062 seconds and found 12 divisors.

iteration 7:
len(divisors_optimized) took 0.00017881393432617188 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00017905235290527344 seconds and found 12 divisors.

iteration 8:
len(divisors_optimized) took 0.00017976760864257812 seconds and found 12 divisors.
number_of_divisors_optimized took 0.0001785755157470703 seconds and found 12 divisors.

iteration 9:
len(divisors_optimized) took 0.00024819374084472656 seconds and found 12 divisors.
number_of_divisors_optimized took 0.00020766258239746094 seconds and found 12 divisors.

You can see that the execution times are very close, each time favoring either.

Can somebody explain to me how come creating a list out of a generator and retrieving its length is about as fast as just counting while iterating? I mean, shouldn't memory allocation (list()) be much more expensive than assignments?

I'm using Python 3.6.3 .

Solution

You're testing far more things than you're producing. The cost of int vs. list of generator operations for the "found factor" case pales in comparison to the amount of work done in total. You're performing over 3000 trial divisions; twelve yields vs. twelve additions is chump change against that sort of work. Replacing the addition/yields with pass (doing nothing), you'll find it still ran in (roughly) the same amount of time:

def ignore_divisors_optimized(number):
    square_root = int(math.sqrt(number))

    for divisor in range(1, square_root):
        if number % divisor == 0:
            pass

    if square_root ** 2 == number:
        pass

And microbenchmarking with ipython's %timeit magic:

>>> %timeit -r5 number_of_divisors_optimized(9999999)
266 µs ± 1.85 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 list(divisors_optimized(9999999))
267 µs ± 1.29 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)
>>> %timeit -r5 ignore_divisors_optimized(9999999)
267 µs ± 1.43 µs per loop (mean ± std. dev. of 5 runs, 1000 loops each)

The fact that number_of_divisors was a microsecond faster isn't relevant (the jitter from repeated tests is higher than a microsecond); they're all basically the same speed, because >99% of the work is the loop and test, not what is done when the test passes.

This is an example of the 90/10 rule of optimization: roughly 90% of your time is spent in 10% of your code (in this case, the trial division itself); 10% is spent in the other 90% of your code. You're optimizing a small part of the 90% of the code that runs 10% of the time, and it doesn't help, because the vast majority of the time is spent on the if number % divisor == 0: line. If you remove that test in favor of just looping over the range doing nothing, the runtime drops to ~78 µs in my local microbenchmarks, meaning the test is occupying nearly 200 µs of the runtime, over twice what all the rest of the code put together requires.

If you want to optimize this, you need to look at approaches that either speed up the trial division line itself (which basically means a different Python interpreter or using Cython to compile it down to C), or ways to run that line fewer times (e.g. precompute possible prime factors up to a certain bound, so for any given input you can avoid testing the non-prime divisors, then producing/computing the number of non-prime factors from the known prime factors and their multiplicity).