Fastest way to get pairs of range(n)?

Imagine you want to process all pairs of the numbers 0 to n-1, for example for n = 4 that's these six pairs:

[(0, 1), (0, 2), (0, 3), (1, 2), (1, 3), (2, 3)]

Three ways to create those pairs:

list(combinations(range(n), 2))

[(i, j) for i, j in combinations(range(n), 2)]

[(i, j) for i in range(n) for j in range(i+1, n)]

Benchmark results for n = 1000:

44.1 ms ± 0.2 ms  f_combinations_pure
57.7 ms ± 0.3 ms  f_combinations
66.6 ms ± 0.1 ms  f_ranges

Note I'm not really interested in just storing the pairs (i, j). That's just a minimal example usage of i and j so that we can compare different approaches without much overhead. In reality, you want to do something with i and j, for example [my_string[i:j] for ...] to get substrings (the question where comments inspired this). So the list(combinations(...)) one doesn't really count here, and I show it just to make that clear (although I still liked seeing how fast it is).

Question 1: Why is f_ranges slower than f_combinations? Its for i in runs only n times overall, so it's insignificant compared to the for j in, which runs n*(n-1)/2 times. And for j in range(...) only assigns one number, whereas for i, j in combinations(...) builds and assigns pairs of numbers, so the latter should be slower. Why is it faster?

Question 2: What's the fastest way you can come up with? For fair comparison, it shall be a list comprehension [(i, j) for ...] producing the same list of pairs.

(As I'm including an answer myself (which is encouraged), I'm including benchmark code there.)

Solution

About question 1: Why is `range` slower than `combinations`?

While for j in range(...) indeed has the advantage of assigning just one number, it has the disadvantage of creating them over and over again. In Python, numbers are objects, and their creation (and deletion) takes a little time.

combinations(...) on the other hand first creates and stores the number objects only once, and then reuses them over and over again in the pairs. You might think "Hold on, it can reuse the numbers, but it produces the pairs as tuple objects, so it also creates one object per iteration!". Well... it has an optimization. It actually reuses the same tuple object over and over again, filling it with different numbers. "What? No way! Tuples are immutable!" Well... ostensibly they're immutable, yes. But if the combinations iterator sees that there are no other references to its result tuple, then it "cheats" and modifies it anyway. At the C code level, it can do that. And if nothing else has a reference to it, then there's no harm. Note that for i, j in ... unpacks the tuple and doesn't keep a reference to it. If you instead use for pair in ..., then pair is a reference to it and the optimization isn't applied and indeed a new result tuple gets created every time. See the source code of combinations_next if you're interested.

About question 2: What's the fastest way?

I found four faster ways:

44.1 ms ± 0.2 ms  f_combinations_pure
51.7 ms ± 0.1 ms  f_list
52.7 ms ± 0.2 ms  f_tee
53.6 ms ± 0.1 ms  f_copy_iterator
54.6 ms ± 0.1 ms  f_tuples
57.7 ms ± 0.3 ms  f_combinations
66.6 ms ± 0.1 ms  f_ranges

All four faster ways avoid what made the range solution slow: Instead of creating (and deleting) Θ(n²) int objects, they reuse the same ones over and over again.

f_tuples puts them into a tuple and iterates over slices:

def f_tuples(n):
    nums = tuple(range(n))
    return [(i, j)
            for i in nums
            for j in nums[i+1:]]

f_list puts them into a list and then before each j-loop, it removes the first number:

def f_list(n):
    js = list(range(n))
    return [(i, j)
            for i in range(n)
            if [js.pop(0)]
            for j in js]

f_copy_iterator puts them into a tuple, then uses an iterator for i and a copy of that iterator for j (which is an iterator starting one position after i):

def f_copy_iterator(n):
    nums = iter(tuple(range(n)))
    return [(i, j)
            for i in nums
            for j in copy(nums)]

f_tee uses itertools.tee for a similar effect as copy. Its JS is the main iterator of j values, and before each j-loop, it discards the first value and then tees JS to get a second iterator of the remaining values:

def f_tee(n):
    return [(i, j)
            for JS in [iter(range(n))]
            for i in range(n)
            for _, (JS, js) in [(next(JS), tee(JS))]
            for j in js]

Bonus question: Is it worth it to optimize like those faster ways?

Meh, probably not. Probably you'd best just use for i, j in combinations(...). The faster ways aren't much faster, and they're somewhat more complicated. Plus, in reality, you'll actually do something with i and j (like getting substrings), so the relatively small speed advantage becomes even relatively smaller.

But I hope you at least found this interesting and perhaps learned something new that is useful some day.

Full benchmark code

Try it online!

def f_combinations_pure(n):
    return list(combinations(range(n), 2))


def f_combinations(n):
    return [(i, j) for i, j in combinations(range(n), 2)]


def f_ranges(n):
    return [(i, j) for i in range(n) for j in range(i+1, n)]


def f_tuples(n):
    nums = tuple(range(n))
    return [(i, j) for i in nums for j in nums[i+1:]]


def f_list(n):
    js = list(range(n))
    return [(i, j) for i in range(n) if [js.pop(0)] for j in js]


def f_copy_iterator(n):
    nums = iter(tuple(range(n)))
    return [(i, j) for i in nums for j in copy(nums)]


def f_tee(n):
    return [(i, j)
            for JS in [iter(range(n))]
            for i in range(n)
            for _, (JS, js) in [(next(JS), tee(JS))]
            for j in js]


fs = [
    f_combinations_pure,
    f_combinations,
    f_ranges,
    f_tuples,
    f_list,
    f_copy_iterator,
    f_tee
]

from timeit import default_timer as time
from itertools import combinations, tee
from statistics import mean, stdev
from random import shuffle
from copy import copy

# Correctness
expect = fs[0](1000)
for f in fs:
    result = f(1000)
    assert result == expect

# Prepare for timing
times = {f: [] for f in fs}
def stats(f):
    ts = [t * 1e3 for t in sorted(times[f])[:5]]
    return f'{mean(ts):4.1f} ms ± {stdev(ts):3.1f} ms '

# Timing
for i in range(25):
    shuffle(fs)
    for f in fs:
        start = time()
        result = f(1000)
        end = time()
        times[f].append(end - start)
        del result

# Results
for f in sorted(fs, key=stats):
    print(stats(f), f.__name__)

Fastest way to get pairs of range(n)?

About question 1: Why is range slower than combinations?

About question 2: What's the fastest way?

Bonus question: Is it worth it to optimize like those faster ways?

Full benchmark code

About question 1: Why is `range` slower than `combinations`?