I wanted to check if a given x
lies in the interval [0,a-1]
.
As a lazy coder I wrote
x in range(a)
and (because that piece of code was in 4.5 nested loops) quickly run into performance issues. I tested it and indeed, it turns out runtime of n in range(n)
lies in O(n), give or take. I actually thought my code would be optimized to x >= 0 and x < a
but it seems this is not the case. Even if I fix the range(a)
in advance the time doesn't become constant (though it improves a lot) - see side notes.
Should I use x >= 0 and x < a
and never write x in range(a)
ever again? Is there an even better way of writing it?
Side notes:
If I try following:
i = range(a)
...
x in i
so that the range is fixed and I only measure runtime of x in i
, I still get runtime in O(x) (assuming a
is large enough).
n in xrange(n)
lies in O(n) as well.The problem with range
in Python 2 is that it creates a list
of values, so x in range(a)
will create a list and linearly scan that list. xrange
should be a generator, but it is not much faster; probably still just linearly scans the values, just without creating the entire list first.
In [2]: %timeit 5*10**5 in range(10**6 + 1) # Python 2
10 loops, best of 3: 18.1 ms per loop
In [3]: %timeit 5*10**5 in xrange(10**6 + 1) # Python 2
100 loops, best of 3: 6.21 ms per loop
In Python 3, range
is much smarter, not only not creating the entire list, but also providing a fast implementation of the contains
check.
In [1]: %timeit 5*10**5 in range(10**6 + 1) # Python 3
1000000 loops, best of 3: 324 ns per loop
Even faster and IMHO more readable: Using comparison chaining:
In [2]: %timeit 0 <= 5*10**5 < 10**6 + 1 # Python 2 or 3
10000000 loops, best of 3: 46.6 ns per loop
Should I use
x >= 0 and x < a
and never write x in range(a) ever again? Is there an even better way of writing it?
"No", "it depends", and "yes". You should not use x >= 0 and x < a
because 0 <= x < a
is shorter and easier to parse (for puny humans), and is interpreted as (0 <= x) and (x < a)
. And you should not use in range
in Python 2, but in Python 3, you can use it if you like.
Still, I'd prefer comparison chaining, since a <= x < b
is much more explicit about bounds than x in range(a, b)
(what if x == b
?), which could prevent many off-by-one errors or +1
padding the range.
Also, note that 0 <= x < a
is not strictly the same as x in range(0, a)
, as a range
will only ever contain integer values, i.e. 1.5 in range(0, 5)
is False
, whereas 0 <= 1.5 < 5
is True
, which may of may not what you want. Also, using range
you can use steps other than 1
, e.g. 5 in range(4, 10, 2)
is False
, but the same can also be implemented using pure math, e.g. as (4 <= x < 10) and (x - 4 % 2 == 0)
.