In python, it is possible to chain operators in this manner:
a op b op c
Which is evaluated to
a op b and b op c
With the only difference being that b
is evaluated only once (so, something more like t = eval(b); a op t and t op c
).
This is advantageous from the view point that it is very readable and more concise than the equivalent version with explicit conjunction (using and
).
However... I've noticed that there is a minor performance difference between chained expressions and the equivalent, be it for 3 operands or 20. This becomes apparent when you time these operations.
import timeit
timeit.timeit("a <= b <= c", setup="a,b,c=1,2,3")
0.1086414959972899
timeit.timeit("a <= b and b <= c", setup="a,b,c=1,2,3")
0.09434155100097996
And,
timeit.timeit("a <= b <= c <= d <= e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.2151330839988077
timeit.timeit("a <= b and b <= c and c <= d and d <= e and e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.19196406500122976
Note: All tests were done with Python-3.4.
Examining the byte code for both expressions, I noticed that one performs significantly more (actually, 4 more) operations than the other.
import dis
dis.dis("a <= b <= c")
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 DUP_TOP
7 ROT_THREE
8 COMPARE_OP 1 (<=)
11 JUMP_IF_FALSE_OR_POP 21
14 LOAD_NAME 2 (c)
17 COMPARE_OP 1 (<=)
20 RETURN_VALUE
>> 21 ROT_TWO
22 POP_TOP
23 RETURN_VALUE
Contrast this with,
dis.dis("a <= b and b <= c")
1 0 LOAD_NAME 0 (a)
3 LOAD_NAME 1 (b)
6 COMPARE_OP 1 (<=)
9 JUMP_IF_FALSE_OR_POP 21
12 LOAD_NAME 1 (b)
15 LOAD_NAME 2 (c)
18 COMPARE_OP 1 (<=)
>> 21 RETURN_VALUE
I am not experienced with reading byte code, but the first code snippet definitely performs more operations at the byte code level than the second.
Here's how I've interpreted this. In the first case, variables are pushed onto some sort of stack, and popped successively for comparison. All variables are popped only once. In the second case, there is no stack, but at least (N - 2) of the operands have to be loaded into memory twice for comparison. It appears the stack popping operation is more expensive than loading (N - 2) variables twice for comparison, accounting for the speed difference.
In a nutshell, I'm trying to understand why one operation is always slower than the other by a constant factor. Is my hypothesis correct? Or is there something more to the python internals I'm missing?
More benchmarks:
| System | a <= b <= c | a <= b and b <= c | a <= b <= ... <= e <= f | a <= b and ... and e <= f | Credit |
|--------|---------------------|---------------------|-------------------------|---------------------------|----------------|
| 3.4 | 0.1086414959972899 | 0.09434155100097996 | 0.2151330839988077 | 0.19196406500122976 | @cᴏʟᴅsᴘᴇᴇᴅ |
| 3.6.2 | 0.06788300536572933 | 0.059271858073771 | 0.1505890181288123 | 0.12044331897050142 | @Bailey Parker |
| 2.7.10 | 0.05009198188781738 | 0.04472208023071289 | 0.11113405227661133 | 0.09062719345092773 | @Bailey Parker |
In CPython's stack-based bytecode execution engine, saving an extra reference to b
for the chained comparison isn't free. It's at the "seriously, don't worry about it" level of cheap, but it's not literally free, and you're comparing it to the slightly cheaper operation of loading a local variable.
The COMPARE_OP
opcode removes the objects it's comparing from the stack, so for the chained comparison, Python has to create another reference to b
(DUP_TOP
) and shove it two places down in the stack (ROT_THREE
) to get it out of the way.
In a <= b and b <= c
, instead of the above reference shuffling, Python just copies another reference to b
out of the stack frame's fastlocals
array. This involves less pointer shuffling and one less trip around the bytecode evaluation loop, so it's slightly cheaper.