Search code examples
pythoncpythonpython-internals

CPython: Why does a 3-line script require far more than 3 cycles in the interpreter to execute?


I just watched this Youtube lecture about CPython Internals by Philip Guo, I am puzzled at one thing.

At 25:55, he modifies the C source of CPython by inserting printf(“hello\n”) at the start of the endless loop which runs all the byte code instructions; you can do the same by:

  • Downloading the Python 2.7 C source code
  • Open the file Python/ceval.c
  • Find the start of the endless evaluation loop, for (;;) {
  • Add the line printf('hello\n'); as the first line of the endless loop.
  • Run configure and make to build the Python binary.

He writes a 3 line test.py:

X = 1
Y = 2
print X + Y

The puzzle is, when he runs test.py with the modified interpreter, how come there are so many “hello” before we see “3”?

That 3 line code should compile to just a few byte code instructions, load value 1, load value 2 and the instruction to call print, so I would imagine when it comes to execute the byte code compiled from test.py, we should see just a few "hello".

So the compiler actually generates many internal byte code instructions before compiling the external Python script?


Solution

  • There are two reasons you see so many hellos printed:

    • Python doesn't have a special bytecode for every possible Python statement. Instead statements will use a combination of bytecodes.
    • The Python interpreter imports a series of Python modules just to start running. You can run a regular Python interpreter with the -v switch to see what is imported each time. Each module consists of multiple statements, so there is quite some bytecode to go through before you get to the little script you are running.

    If I put those 3 lines into test.py and use my unmodified Python 2.7 binary to run that, with the -v switch, I see:

    $ python2.7 -v test.py
    # installing zipimport hook
    import zipimport # builtin
    # installed zipimport hook
    # /..../lib/python2.7/site.pyc matches /..../lib/python2.7/site.py
    import site # precompiled from /..../lib/python2.7/site.pyc
    # /..../lib/python2.7/os.pyc matches /..../lib/python2.7/os.py
    import os # precompiled from /..../lib/python2.7/os.pyc
    import errno # builtin
    import posix # builtin
    # /..../lib/python2.7/posixpath.pyc matches /..../lib/python2.7/posixpath.py
    import posixpath # precompiled from /..../lib/python2.7/posixpath.pyc
    # /..../lib/python2.7/stat.pyc matches /..../lib/python2.7/stat.py
    import stat # precompiled from /..../lib/python2.7/stat.pyc
    # /..../lib/python2.7/genericpath.pyc matches /..../lib/python2.7/genericpath.py
    import genericpath # precompiled from /..../lib/python2.7/genericpath.pyc
    # /..../lib/python2.7/warnings.pyc matches /..../lib/python2.7/warnings.py
    import warnings # precompiled from /..../lib/python2.7/warnings.pyc
    # /..../lib/python2.7/linecache.pyc matches /..../lib/python2.7/linecache.py
    import linecache # precompiled from /..../lib/python2.7/linecache.pyc
    # /..../lib/python2.7/types.pyc matches /..../lib/python2.7/types.py
    import types # precompiled from /..../lib/python2.7/types.pyc
    # /..../lib/python2.7/UserDict.pyc matches /..../lib/python2.7/UserDict.py
    import UserDict # precompiled from /..../lib/python2.7/UserDict.pyc
    # /..../lib/python2.7/_abcoll.pyc matches /..../lib/python2.7/_abcoll.py
    import _abcoll # precompiled from /..../lib/python2.7/_abcoll.pyc
    # /..../lib/python2.7/abc.pyc matches /..../lib/python2.7/abc.py
    import abc # precompiled from /..../lib/python2.7/abc.pyc
    # /..../lib/python2.7/_weakrefset.pyc matches /..../lib/python2.7/_weakrefset.py
    import _weakrefset # precompiled from /..../lib/python2.7/_weakrefset.pyc
    import _weakref # builtin
    # /..../lib/python2.7/copy_reg.pyc matches /..../lib/python2.7/copy_reg.py
    import copy_reg # precompiled from /..../lib/python2.7/copy_reg.pyc
    import encodings # directory /..../lib/python2.7/encodings
    # /..../lib/python2.7/encodings/__init__.pyc matches /..../lib/python2.7/encodings/__init__.py
    import encodings # precompiled from /..../lib/python2.7/encodings/__init__.pyc
    # /..../lib/python2.7/codecs.pyc matches /..../lib/python2.7/codecs.py
    import codecs # precompiled from /..../lib/python2.7/codecs.pyc
    import _codecs # builtin
    # /..../lib/python2.7/encodings/aliases.pyc matches /..../lib/python2.7/encodings/aliases.py
    import encodings.aliases # precompiled from /..../lib/python2.7/encodings/aliases.pyc
    # /..../lib/python2.7/encodings/utf_8.pyc matches /..../lib/python2.7/encodings/utf_8.py
    import encodings.utf_8 # precompiled from /..../lib/python2.7/encodings/utf_8.pyc
    Python 2.7.15 (default, May  7 2018, 17:08:03)
    [GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    3
    # -- clean-up output omitted --
    

    Each import ... line in there references either a built-in module (part of the Python binary, implemented in C) or a .pyc bytecode cache file. There are 17 such files being imported before the script code even is run.

    The 3 lines of code in the main script translate to a further 9 bytecode instructions:

    >>> import dis
    >>> dis.dis(compile(r'''\
    ... X = 1
    ... Y = 2
    ... print X + Y
    ... ''', '', 'exec'))
      2           0 LOAD_CONST               0 (1)
                  3 STORE_NAME               0 (X)
    
      3           6 LOAD_CONST               1 (2)
                  9 STORE_NAME               1 (Y)
    
      4          12 LOAD_NAME                0 (X)
                 15 LOAD_NAME                1 (Y)
                 18 BINARY_ADD
                 19 PRINT_ITEM
                 20 PRINT_NEWLINE
                 21 LOAD_CONST               2 (None)
                 24 RETURN_VALUE
    

    (I ignored the 2 bytecodes at the end, encoding an extra return None that's not really applicable to a module).