I was reading an interesting post on Short-Circuiting in Python and wondered if this was true for the in
operator. My simple testing would conclude that it does not:
%%timeit -n 1000
0 in list(range(10))
1000 loops, best of 3: 639 ns per loop
%%timeit -n 1000
0 in list(range(1000))
1000 loops, best of 3: 23.7 µs per loop
# larger the list, the longer it takes. however, i do notice that a higher
# value does take longer.
%%timeit -n 1000
999 in list(range(1000))
1000 loops, best of 3: 45.1 µs per loop
Is there a detailed explanation of why 999
takes longer than 0
. Is the in
operator like a loop?
Also, is there a way to tell the in
operator to "stop the loop" once the value is found (or is this the already defaulted behavior that I'm not seeing)?
Lastly- Is there another operator/function that I am skipping over that does what I'm talking about in regards to "short-circuiting" in
?
The implementation of in
for list
objects is found in list_contains
. It performs a scan of the list and does exit early if the last comparison has found the element, there's no point in continuing there.
The loop involved is:
for (i = 0, cmp = 0 ; cmp == 0 && i < Py_SIZE(a); ++i)
cmp = PyObject_RichCompareBool(el, PyList_GET_ITEM(a, i),
Py_EQ);
If cmp
is 1
(the value returned from PyObject_RichCompareBool
for a match), the for
loop condition (cmp == 0 && i < Py_SIZE(a)
) becomes false and terminates.
For list objects, which are built-in, what is called for in
is a C
function (for CPython). For other implementations of Python, this can be a different language using different language constructs.
For user-defined classes in Python, what is called is defined in the Membership test operations of the Reference Manual, take a look there for a run-down of what gets called.
You could also come to this conclusion by timing:
l = [*range(1000)]
%timeit 1 in l
85.8 ns ± 11.9 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
%timeit 999 in l
22 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
The furthest the element the more you need to scan. If it didn't short-circuit, all in
operations would result in similar timings.