Search code examples
pythoncomparisonpython-internals

Why are chained operator expressions slower than their expanded equivalent?


In python, it is possible to chain operators in this manner:

a op b op c

Which is evaluated to

a op b and b op c 

With the only difference being that b is evaluated only once (so, something more like t = eval(b); a op t and t op c).

This is advantageous from the view point that it is very readable and more concise than the equivalent version with explicit conjunction (using and).

However... I've noticed that there is a minor performance difference between chained expressions and the equivalent, be it for 3 operands or 20. This becomes apparent when you time these operations.

import timeit 

timeit.timeit("a <= b <= c", setup="a,b,c=1,2,3")
0.1086414959972899

timeit.timeit("a <= b and b <= c", setup="a,b,c=1,2,3")
0.09434155100097996

And,

timeit.timeit("a <= b <= c <= d <= e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.2151330839988077

timeit.timeit("a <= b and b <= c and c <= d and d <= e and e <= f", setup="a,b,c,d,e,f=1,2,3,4,5,6")
0.19196406500122976

Note: All tests were done with Python-3.4.

Examining the byte code for both expressions, I noticed that one performs significantly more (actually, 4 more) operations than the other.

import dis

dis.dis("a <= b <= c")
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 DUP_TOP
              7 ROT_THREE
              8 COMPARE_OP               1 (<=)
             11 JUMP_IF_FALSE_OR_POP    21
             14 LOAD_NAME                2 (c)
             17 COMPARE_OP               1 (<=)
             20 RETURN_VALUE
        >>   21 ROT_TWO
             22 POP_TOP
             23 RETURN_VALUE 

Contrast this with,

dis.dis("a <= b and b <= c")
  1           0 LOAD_NAME                0 (a)
              3 LOAD_NAME                1 (b)
              6 COMPARE_OP               1 (<=)
              9 JUMP_IF_FALSE_OR_POP    21
             12 LOAD_NAME                1 (b)
             15 LOAD_NAME                2 (c)
             18 COMPARE_OP               1 (<=)
        >>   21 RETURN_VALUE

I am not experienced with reading byte code, but the first code snippet definitely performs more operations at the byte code level than the second.

Here's how I've interpreted this. In the first case, variables are pushed onto some sort of stack, and popped successively for comparison. All variables are popped only once. In the second case, there is no stack, but at least (N - 2) of the operands have to be loaded into memory twice for comparison. It appears the stack popping operation is more expensive than loading (N - 2) variables twice for comparison, accounting for the speed difference.

In a nutshell, I'm trying to understand why one operation is always slower than the other by a constant factor. Is my hypothesis correct? Or is there something more to the python internals I'm missing?


More benchmarks:

| System | a <= b <= c         | a <= b and b <= c   | a <= b <= ... <= e <= f | a <= b and ... and e <= f | Credit         |
|--------|---------------------|---------------------|-------------------------|---------------------------|----------------|
| 3.4    | 0.1086414959972899  | 0.09434155100097996 | 0.2151330839988077      | 0.19196406500122976       | @cᴏʟᴅsᴘᴇᴇᴅ     |
| 3.6.2  | 0.06788300536572933 | 0.059271858073771   | 0.1505890181288123      | 0.12044331897050142       | @Bailey Parker |
| 2.7.10 | 0.05009198188781738 | 0.04472208023071289 | 0.11113405227661133     | 0.09062719345092773       | @Bailey Parker |

Solution

  • In CPython's stack-based bytecode execution engine, saving an extra reference to b for the chained comparison isn't free. It's at the "seriously, don't worry about it" level of cheap, but it's not literally free, and you're comparing it to the slightly cheaper operation of loading a local variable.

    The COMPARE_OP opcode removes the objects it's comparing from the stack, so for the chained comparison, Python has to create another reference to b (DUP_TOP) and shove it two places down in the stack (ROT_THREE) to get it out of the way.

    In a <= b and b <= c, instead of the above reference shuffling, Python just copies another reference to b out of the stack frame's fastlocals array. This involves less pointer shuffling and one less trip around the bytecode evaluation loop, so it's slightly cheaper.