Search code examples
pythoncompilation

How does the compilation process of Python source code work?


Lately, I've started exploring compilation process of Python source code. While exploring, I've encountered confusing results.

    import dis
    
    
    def f():
        x = 1
        return x
    
    
    dis.dis(f)

Output:

  4           0 RESUME                   0

  5           2 LOAD_CONST               1 (1)
              4 STORE_FAST               0 (x)

  6           6 LOAD_FAST                0 (x)
              8 RETURN_VALUE


  1. In the example above, I created a function which contains the instructions below.

  2. Create a variable and assign it to integer object whose value is one. Notice here, the interpreter didn't create the integer object since it was already created at the beginning of the program (optimization) and it is a singleton object, thus there can be only one integer object whose value is one.

  3. Return the object.

So far there is no problem. This one on the other hand confused me.

import dis
import sys

print(sys.getrefcount(10 ** 32))


def f():
    x = 10 ** 32
    return x


dis.dis(f)

Output:

4
  7           0 RESUME                   0

  8           2 LOAD_CONST               1 (100000000000000000000000000000000)
              4 STORE_FAST               0 (x)

  9           6 LOAD_FAST                0 (x)
              8 RETURN_VALUE


getrefcount returning more than one means this object is created at the start of the program.

The questions are:

  1. Why isn't the arithmetic operation instruction visible when function f's byte code is disassembled?

  2. Let's say that the reason behind is that interpreter creates the object at the beginning of the program and it simply loaded that object.

  3. If that is the reason, then, how does the interpreter know that the result of 10**32 operation is already created before executing the operation, hence receiving the result?

  4. And, why such a high number is created at the beginning of the program? The Python documentation states that integers in range of -5 to 256 are created at the start of the program. Obviously, 10 ** 32 is outside of this range.


import dis
import sys

print(sys.getrefcount(10 ** 33))


def f():
    x = 10 ** 33
    return x


dis.dis(f)

Output:

1
  7           0 RESUME                   0

  8           2 LOAD_CONST               1 (10)
              4 LOAD_CONST               2 (33)
              6 BINARY_OP                8 (**)
             10 STORE_FAST               0 (x)

  9          12 LOAD_FAST                0 (x)
             14 RETURN_VALUE
  1. getrefcount returning one means only one reference made to 10**33, which is the temporarily reference when this object is passed to the getrefcount function.

That means the interpreter didn't create this object at the start of the program, thus we can say the upper limit is something close to 10 ** 33 (contradicting what is written in the documentation).

  1. As you see, when the result of this expression isn't created at the start of the program, the operation is visible.

I expect arithmetic operation to be visible in the second example as it is visible in third example, but it is not.

I expect integer numbers higher than 256 not to be created at the start of the program according to documentations, but they are created and actually the upper limit is way more higher.


Solution

  • I've found out that Python implements an operation called "Constant Folding" which is responsible for the difference between two last examples, and since it is explained in different sources way more better than I can, I will simply share the links to those sources.

    Here is a video I found out the reason => https://youtu.be/HVUTjQzESeo?t=1747

    Another resource here => What are the specific rules for constant folding?