Imagine you want to process all pairs of the numbers 0 to n-1, for example for n = 4
that's these six pairs:
[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]
Three ways to create those pairs:
list(combinations(range(n), 2))
[(i, j) for i, j in combinations(range(n), 2)]
[(i, j) for i in range(n) for j in range(i+1, n)]
Benchmark results for n = 1000
:
44.1 ms ± 0.2 ms f_combinations_pure
57.7 ms ± 0.3 ms f_combinations
66.6 ms ± 0.1 ms f_ranges
Note I'm not really interested in just storing the pairs (i, j)
. That's just a minimal example usage of i
and j
so that we can compare different approaches without much overhead. In reality, you want to do something with i
and j
, for example [my_string[i:j] for ...]
to get substrings (the question where comments inspired this). So the list(combinations(...))
one doesn't really count here, and I show it just to make that clear (although I still liked seeing how fast it is).
Question 1: Why is f_ranges
slower than f_combinations
? Its for i in
runs only n
times overall, so it's insignificant compared to the for j in
, which runs n*(n-1)/2
times. And for j in range(...)
only assigns one number, whereas for i, j in combinations(...)
builds and assigns pairs of numbers, so the latter should be slower. Why is it faster?
Question 2: What's the fastest way you can come up with? For fair comparison, it shall be a list comprehension [(i, j) for ...]
producing the same list of pairs.
(As I'm including an answer myself (which is encouraged), I'm including benchmark code there.)
range
slower than combinations
?While for j in range(...)
indeed has the advantage of assigning just one number, it has the disadvantage of creating them over and over again. In Python, numbers are objects, and their creation (and deletion) takes a little time.
combinations(...)
on the other hand first creates and stores the number objects only once, and then reuses them over and over again in the pairs. You might think "Hold on, it can reuse the numbers, but it produces the pairs as tuple
objects, so it also creates one object per iteration!". Well... it has an optimization. It actually reuses the same tuple
object over and over again, filling it with different numbers. "What? No way! Tuples are immutable!" Well... ostensibly they're immutable, yes. But if the combinations
iterator sees that there are no other references to its result tuple, then it "cheats" and modifies it anyway. At the C code level, it can do that. And if nothing else has a reference to it, then there's no harm. Note that for i, j in ...
unpacks the tuple and doesn't keep a reference to it. If you instead use for pair in ...
, then pair
is a reference to it and the optimization isn't applied and indeed a new result
tuple gets created every time. See the source code of combinations_next if you're interested.
I found four faster ways:
44.1 ms ± 0.2 ms f_combinations_pure
51.7 ms ± 0.1 ms f_list
52.7 ms ± 0.2 ms f_tee
53.6 ms ± 0.1 ms f_copy_iterator
54.6 ms ± 0.1 ms f_tuples
57.7 ms ± 0.3 ms f_combinations
66.6 ms ± 0.1 ms f_ranges
All four faster ways avoid what made the range
solution slow: Instead of creating (and deleting) Θ(n²) int
objects, they reuse the same ones over and over again.
f_tuples
puts them into a tuple and iterates over slices:
def f_tuples(n):
nums = tuple(range(n))
return [(i, j)
for i in nums
for j in nums[i+1:]]
f_list
puts them into a list and then before each j
-loop, it removes the first number:
def f_list(n):
js = list(range(n))
return [(i, j)
for i in range(n)
if [js.pop(0)]
for j in js]
f_copy_iterator
puts them into a tuple, then uses an iterator for i
and a copy of that iterator for j
(which is an iterator starting one position after i
):
def f_copy_iterator(n):
nums = iter(tuple(range(n)))
return [(i, j)
for i in nums
for j in copy(nums)]
f_tee
uses itertools.tee for a similar effect as copy
. Its JS
is the main iterator of j
values, and before each j
-loop, it discards the first value and then tees JS
to get a second iterator of the remaining values:
def f_tee(n):
return [(i, j)
for JS in [iter(range(n))]
for i in range(n)
for _, (JS, js) in [(next(JS), tee(JS))]
for j in js]
Meh, probably not. Probably you'd best just use for i, j in combinations(...)
. The faster ways aren't much faster, and they're somewhat more complicated. Plus, in reality, you'll actually do something with i
and j
(like getting substrings), so the relatively small speed advantage becomes even relatively smaller.
But I hope you at least found this interesting and perhaps learned something new that is useful some day.
def f_combinations_pure(n):
return list(combinations(range(n), 2))
def f_combinations(n):
return [(i, j) for i, j in combinations(range(n), 2)]
def f_ranges(n):
return [(i, j) for i in range(n) for j in range(i+1, n)]
def f_tuples(n):
nums = tuple(range(n))
return [(i, j) for i in nums for j in nums[i+1:]]
def f_list(n):
js = list(range(n))
return [(i, j) for i in range(n) if [js.pop(0)] for j in js]
def f_copy_iterator(n):
nums = iter(tuple(range(n)))
return [(i, j) for i in nums for j in copy(nums)]
def f_tee(n):
return [(i, j)
for JS in [iter(range(n))]
for i in range(n)
for _, (JS, js) in [(next(JS), tee(JS))]
for j in js]
fs = [
f_combinations_pure,
f_combinations,
f_ranges,
f_tuples,
f_list,
f_copy_iterator,
f_tee
]
from timeit import default_timer as time
from itertools import combinations, tee
from statistics import mean, stdev
from random import shuffle
from copy import copy
# Correctness
expect = fs[0](1000)
for f in fs:
result = f(1000)
assert result == expect
# Prepare for timing
times = {f: [] for f in fs}
def stats(f):
ts = [t * 1e3 for t in sorted(times[f])[:5]]
return f'{mean(ts):4.1f} ms ± {stdev(ts):3.1f} ms '
# Timing
for i in range(25):
shuffle(fs)
for f in fs:
start = time()
result = f(1000)
end = time()
times[f].append(end - start)
del result
# Results
for f in sorted(fs, key=stats):
print(stats(f), f.__name__)