Search code examples
pythonpython-3.xcpythonopcodeduck-typing

Understanding why theses opcodes from different codes are the same


I would like to deeply understand why these two following generated opcodes are the same (except for the values loaded/stored).

Especially how can this 'BINARY_MULTIPLY' be used for both str and int ? Does C (CPython) type checks under the hood and apply the correct function whether values are strings or ints ?

And can we say that this mechanism is related to duck typing ?

>>> def tata():
...     a = 1
...     b = 1
...     c = a * b
... 
>>> dis.dis(tata)
  2           0 LOAD_CONST               1 (1)
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               1 (1)
              9 STORE_FAST               1 (b)

  4          12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 BINARY_MULTIPLY     
             19 STORE_FAST               2 (c)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE      

>>> def toto():
...     a = "1"
...     b = "1"
...     c = a * b
... 
>>> dis.dis(toto)
  2           0 LOAD_CONST               1 ('1')
              3 STORE_FAST               0 (a)

  3           6 LOAD_CONST               1 ('1')
              9 STORE_FAST               1 (b)

  4          12 LOAD_FAST                0 (a)
             15 LOAD_FAST                1 (b)
             18 BINARY_MULTIPLY     
             19 STORE_FAST               2 (c)
             22 LOAD_CONST               0 (None)
             25 RETURN_VALUE      

Solution

  • Python bytecode is extremely high level, and given the extremely dynamic semantics of the language it cannot do much differently. BINARY_MULTIPLY is emitted when you specify * in your source code, whatever the types of the operands. What to do exactly is determined at runtime.

    This is quite obvious in hindsight: in Python in general the types are known only at runtime, and given the flexibility it allows (through e.g. monkeypatching) you can determine what to do only at the very moment of execution. Unsurprisingly, this is one of the reasons why CPython is so slow.

    In specific cases, such as these shown in your example, the compiler could perform type inference and perform the calculations at compile time, or at least emit some (imaginary) more specific opcodes. Unfortunately, that would complicate the interpreter and wouldn't help much in the general case, as generally your computations involve parameters coming from the outside, such as:

    def square(x):
        return x*x
    

    x here could be of any type, so compile-time smartness isn't useful.

    def times5(x):
        return x * 5
    

    even if 5 here is known, times5 will do completely different stuff depending from the type of x ("a" -> "aaaaa"; 2 -> 10; 4.5 -> 22.5; some custom class type -> it depends from operator overloading, known only at runtime).

    You could go the asm.js way and find oblique ways to provide type hints, but instead the high-performance implementation of Python (PyPy) just uses a tracing JIT approach to deduce by itself the parameter types that are commonly used (after running the code for a while) and generates optimized machine code taylored for the observed cases.