This works well and returns 249999500000:
sum([x for x in range(1_000_000) if x%2==0])
This is slower but still returns 24999995000000:
sum([x for x in range(10_000_000) if x%2==0])
However, larger range of values such as 1_000_000_000
takes a very long time to compute. In fact, this returns an error:
sum([x for x in range(10_000_000_000) if x%2==0])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
MemoryError.
I've seen posts saying that the built-in sum function is the fastest but this doesn't look like being fast to me. My question then is, do we have a faster Pythonic approach to compute large sums like this? Or is it perhaps a hardware limitation?
After I posted the below code based on the Gauss Sum, Kelly Bundy pointed out a simplification of the math and a practical simplification making use of properties of a range
object. He suggested I add this here.
An arbitrary range can be visualized as a trapezoid. That there is the sum for the area of a trapezoid.
It also uses the ability of a range
to know its length. That simplifies my (stop - start + step-1) // step
expression.
def sum_range(*args):
rg = range(*args)
return (rg[0] + rg[-1]) * len(rg) // 2 if rg else 0
Following is my original answer. I've gridsearched both functions for being identical to sum_naive
, given argument ranges that cover all the cases.
for arbitrary start, stop, step
:
#def sum_naive(start, stop, step): return sum(range(start, stop, step))
def sum_gauss(start, stop, step=1):
if step == 0:
raise ValueError("range() arg 3 must not be zero")
if step < 0:
step = -step
shift = 1 + (start - stop - 1) % step
(start, stop) = (stop+shift, start+1)
if start >= stop:
return 0
# triangle on a base
n = (stop - start + step-1) // step
base = start * n
triangle = n * (n - 1) // 2 * step
return base + triangle
A "geometric" interpretation should make these expressions self-evident. ∎ SCNR
The negative step case is rewritten into a positive step case using equivalent start
and stop
.
The sums of even and odd integers in a given range start, stop
are simply those:
one_half = sum_gauss(start, stop, 2)
other_half = sum_gauss(start+1, stop, 2)
# no need to know which are which
assert one_half + other_half == sum_gauss(start, stop, 1)