Search code examples
pythonsumlist-comprehension

Python's built in sum function taking forever to compute sums of very large range of values in a list


This works well and returns 249999500000:

sum([x for x in range(1_000_000) if x%2==0])

This is slower but still returns 24999995000000:

sum([x for x in range(10_000_000) if x%2==0])

However, larger range of values such as 1_000_000_000 takes a very long time to compute. In fact, this returns an error:

sum([x for x in range(10_000_000_000) if x%2==0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <listcomp>
MemoryError.

I've seen posts saying that the built-in sum function is the fastest but this doesn't look like being fast to me. My question then is, do we have a faster Pythonic approach to compute large sums like this? Or is it perhaps a hardware limitation?


Solution

  • After I posted the below code based on the Gauss Sum, Kelly Bundy pointed out a simplification of the math and a practical simplification making use of properties of a range object. He suggested I add this here.

    An arbitrary range can be visualized as a trapezoid. That there is the sum for the area of a trapezoid.

    It also uses the ability of a range to know its length. That simplifies my (stop - start + step-1) // step expression.

    def sum_range(*args):
        rg = range(*args)
        return (rg[0] + rg[-1]) * len(rg) // 2 if rg else 0
    

    Following is my original answer. I've gridsearched both functions for being identical to sum_naive, given argument ranges that cover all the cases.


    for arbitrary start, stop, step:

    #def sum_naive(start, stop, step): return sum(range(start, stop, step))
    
    def sum_gauss(start, stop, step=1):
        if step == 0:
            raise ValueError("range() arg 3 must not be zero")
    
        if step < 0:
            step = -step
            shift = 1 + (start - stop - 1) % step
            (start, stop) = (stop+shift, start+1)
    
        if start >= stop:
            return 0
    
        # triangle on a base
        n = (stop - start + step-1) // step
        base = start * n
        triangle = n * (n - 1) // 2 * step
        return base + triangle
    

    A "geometric" interpretation should make these expressions self-evident. ∎ SCNR

    The negative step case is rewritten into a positive step case using equivalent start and stop.

    The sums of even and odd integers in a given range start, stop are simply those:

    one_half = sum_gauss(start, stop, 2)
    other_half = sum_gauss(start+1, stop, 2)
    # no need to know which are which
    assert one_half + other_half == sum_gauss(start, stop, 1)