Search code examples
pythonvirtual-machinecpubytecode

Does Python Virtual Machine require a CPU to execute the bytecode?


Does Python Virtual Machine require a CPU to execute the bytecode? Is the bytecode converted into the machine code and then the CPU gets involved in the process?


Solution

  • In order to run an application on any computer, its code must always be somehow converted to machine code and then be executed by the CPU. The question is rather when and how this happens.

    Let me try and show you how Python effectively executes bytecode.

    Compiler vs Interpreter

    Imagine the CPU in your computer understands nothing but Latin. You want to send it a letter with detailed instructions or a request, but you do not speak Latin. So, you will engage a translator: someone who translates your "English" letter (or whatever language you use) to Latin for you.

    Compiled languages like C or Rust take your entire letter, translate all of it to Latin and really polish it. The outcome is a translated letter that is highly poetic and uses sophisticated language. An interpreter like Python, on the other hand, translates one word or one sentence at a time; it is more really like an interpreter as you encounter in the news that translates what someone in a foreign language says as they speak.

    Bytecode

    The full translation process from languages like C, Rust, or Python to machine code is quite complex and requires to carefully analyse the original program code. In order to avoid having to analyse your program code over and over again, the Python interpreter will do it just once, and then generate bytecode that is a very close representation of your Python code, but split up into the basic elements.

    Let's take a look at a very simple Python function:

    def f(x):
        y = (x + 1)*(x - 1)
        return y
    

    The computation in this function comprises several calculations, which all have to be performed in the correct order. The bytecode reflects this:

        LOAD_VAR     x    # x+1
        LOAD_CONST   1
        ADD
        LOAD_VAR     x    # x-1
        LOAD_CONST   1
        SUBTRACT
        MULTIPLY          # ()*()
        STORE_VAR    y    # y = ...
        LOAD_VAR     y
        RETURN
    

    Indeed, the bytecode in Python is usually a very close representation of the Python code itself, just broken up into pieces of 'atomic' simple operations.

    Internally, each bytecode instruction has a numeric value (that actually fits into a byte, hence the name). For instance, LOAD_VAR = 124, LOAD_CONST = 100, ADD = 23, etc. And the local variables and constant value are also expressed through numbers. Thus, if we assign x = 01 and y = 02, the above code becomes:

      124,  01, 100,  01,  23, 124,  01, 100,  01,  
       24,  20, 125,  02, 124,  02,  83
    

    Executing Bytecode

    Below you will find a simple and minimalistic interpreter for 'Python bytecode' that is capable of executing the function we have defined in the beginning. The actual bytecode interpreter of Python is written in C and thus compiled to highly efficient machine code. But the principle is exactly the same.

    It uses a stack to hold intermediate values. That is, the result of each operation is appended to a list. An operation that further processes these results takes them off the end of the list, does something (like add them together), and appends the result then back to the list (but you have to be careful when doing things like subtraction or division to keep the right order).

    It is convenient to arrange the bytecode into pairs of instructions and arguments. Some instructions (like ADD) do not have an argument, so we just use 0 in that case. But the code used here is still the bytecode presented above.

    def execute(bytecode, consts, vars):
        stack = []
        for (instr, arg) in bytecode:
            if instr == 20:
                stack.append(stack.pop() * stack.pop())
            elif instr == 23:
                stack.append(stack.pop() + stack.pop())
            elif instr == 24:
                second = stack.pop()
                first  = stack.pop()
                stack.append(first - second)
            elif instr == 83:
                return stack.pop()
            elif instr == 100:
                stack.append( consts[arg] )
            elif instr == 124:
                stack.append( vars[arg] )
            elif instr == 125:
                vars[arg] = stack.pop()
    
    my_bytecode = [
      (124, 1), (100, 1), (23, 0), (124, 1), (100, 1), 
       (24, 0), (20, 0), (125, 2), (124, 2),  (83, 0)
    ]
    my_consts = [ None, 1 ]   
    my_vars   = [ x, 0 ]
    execute(my_bytecode, my_consts, my_vars)
    

    You can actually look at the lists of constant values (although they are actually tuples, not lists), or in what order the local variables are defined using:

    print(f.__code__.co_code)      # prints the bytecode
    print(f.__code__.co_consts)    # prints (None, 1)
    print(f.__code__.co_varnames)  # prints ('x', 'y')
    

    A tad more convenient is to use the inspect and dis modules, of course.