I just watched this Youtube lecture about CPython Internals by Philip Guo, I am puzzled at one thing.
At 25:55, he modifies the C source of CPython by inserting printf(“hello\n”)
at the start of the endless loop which runs all the byte code instructions; you can do the same by:
Python/ceval.c
for (;;) {
printf('hello\n');
as the first line of the endless loop.configure
and make
to build the Python binary.He writes a 3 line test.py:
X = 1
Y = 2
print X + Y
The puzzle is, when he runs test.py with the modified interpreter, how come there are so many “hello” before we see “3”?
That 3 line code should compile to just a few byte code instructions, load value 1, load value 2 and the instruction to call print, so I would imagine when it comes to execute the byte code compiled from test.py, we should see just a few "hello".
So the compiler actually generates many internal byte code instructions before compiling the external Python script?
There are two reasons you see so many hello
s printed:
-v
switch to see what is imported each time. Each module consists of multiple statements, so there is quite some bytecode to go through before you get to the little script you are running.If I put those 3 lines into test.py
and use my unmodified Python 2.7 binary to run that, with the -v
switch, I see:
$ python2.7 -v test.py
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /..../lib/python2.7/site.pyc matches /..../lib/python2.7/site.py
import site # precompiled from /..../lib/python2.7/site.pyc
# /..../lib/python2.7/os.pyc matches /..../lib/python2.7/os.py
import os # precompiled from /..../lib/python2.7/os.pyc
import errno # builtin
import posix # builtin
# /..../lib/python2.7/posixpath.pyc matches /..../lib/python2.7/posixpath.py
import posixpath # precompiled from /..../lib/python2.7/posixpath.pyc
# /..../lib/python2.7/stat.pyc matches /..../lib/python2.7/stat.py
import stat # precompiled from /..../lib/python2.7/stat.pyc
# /..../lib/python2.7/genericpath.pyc matches /..../lib/python2.7/genericpath.py
import genericpath # precompiled from /..../lib/python2.7/genericpath.pyc
# /..../lib/python2.7/warnings.pyc matches /..../lib/python2.7/warnings.py
import warnings # precompiled from /..../lib/python2.7/warnings.pyc
# /..../lib/python2.7/linecache.pyc matches /..../lib/python2.7/linecache.py
import linecache # precompiled from /..../lib/python2.7/linecache.pyc
# /..../lib/python2.7/types.pyc matches /..../lib/python2.7/types.py
import types # precompiled from /..../lib/python2.7/types.pyc
# /..../lib/python2.7/UserDict.pyc matches /..../lib/python2.7/UserDict.py
import UserDict # precompiled from /..../lib/python2.7/UserDict.pyc
# /..../lib/python2.7/_abcoll.pyc matches /..../lib/python2.7/_abcoll.py
import _abcoll # precompiled from /..../lib/python2.7/_abcoll.pyc
# /..../lib/python2.7/abc.pyc matches /..../lib/python2.7/abc.py
import abc # precompiled from /..../lib/python2.7/abc.pyc
# /..../lib/python2.7/_weakrefset.pyc matches /..../lib/python2.7/_weakrefset.py
import _weakrefset # precompiled from /..../lib/python2.7/_weakrefset.pyc
import _weakref # builtin
# /..../lib/python2.7/copy_reg.pyc matches /..../lib/python2.7/copy_reg.py
import copy_reg # precompiled from /..../lib/python2.7/copy_reg.pyc
import encodings # directory /..../lib/python2.7/encodings
# /..../lib/python2.7/encodings/__init__.pyc matches /..../lib/python2.7/encodings/__init__.py
import encodings # precompiled from /..../lib/python2.7/encodings/__init__.pyc
# /..../lib/python2.7/codecs.pyc matches /..../lib/python2.7/codecs.py
import codecs # precompiled from /..../lib/python2.7/codecs.pyc
import _codecs # builtin
# /..../lib/python2.7/encodings/aliases.pyc matches /..../lib/python2.7/encodings/aliases.py
import encodings.aliases # precompiled from /..../lib/python2.7/encodings/aliases.pyc
# /..../lib/python2.7/encodings/utf_8.pyc matches /..../lib/python2.7/encodings/utf_8.py
import encodings.utf_8 # precompiled from /..../lib/python2.7/encodings/utf_8.pyc
Python 2.7.15 (default, May 7 2018, 17:08:03)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
3
# -- clean-up output omitted --
Each import ...
line in there references either a built-in module (part of the Python binary, implemented in C) or a .pyc
bytecode cache file. There are 17 such files being imported before the script code even is run.
The 3 lines of code in the main script translate to a further 9 bytecode instructions:
>>> import dis
>>> dis.dis(compile(r'''\
... X = 1
... Y = 2
... print X + Y
... ''', '', 'exec'))
2 0 LOAD_CONST 0 (1)
3 STORE_NAME 0 (X)
3 6 LOAD_CONST 1 (2)
9 STORE_NAME 1 (Y)
4 12 LOAD_NAME 0 (X)
15 LOAD_NAME 1 (Y)
18 BINARY_ADD
19 PRINT_ITEM
20 PRINT_NEWLINE
21 LOAD_CONST 2 (None)
24 RETURN_VALUE
(I ignored the 2 bytecodes at the end, encoding an extra return None
that's not really applicable to a module).