Search code examples
pythoncdecompiling

Why do C programs require decompilers but python programs dont?


If I write a python script, anyone can simply point an editor to it and read it. But for programming written in C, one would have to use decompilers and hex tables and such. Why is that? I mean I simply can't open up the Safari web browser and look at its code.


Solution

  • Note: The author disavows a deep expertise in this subject. Some assertions may be incorrect.

    Python actually is compiled into bytecode, which is what gets run by the python interpreter. Whenever you use a Python module, Python will generate a .pyc file with a name corresponding to the module. This is the equivalent of the .o file that's generated when you compile a C file.

    So if you want something to disassemble, the .pyc file would be it :)

    The process that Python goes through when compiling a module is pretty similar to what gcc or another C compiler does with C source code. The major difference is that it happens transparently as part of execution of the file. It's also optional: when running a non-module, i.e. an end-user script, Python will just interpret the code rather than compiling it first.

    So really your question is "Why are python programs distributed as source rather than as compiled modules?" Or, put another way, "Why are C applications distributed as compiled binaries rather than as source code?"

    It used to be very common for C applications to be distributed as source code. This was back before operating systems and their various subentities (i.e. linux distributions) became more established. Some distros, for example gentoo, still distribute apps as source code. Apps which are a bit more cutting edge or obscure are still distributed as source code for all platforms they target.

    The reason for this is compatibility, and dependencies. The reason you can run the precompiled binary Safari on a Mac, or Firefox on Ubuntu Linux, is because it's been specifically built for that operating system, architecture (e.g. x86_64), and set of libraries.

    Unfortunately, compilation of a large app is pretty slow, and needs to be redone at least partially every time the app is updated. Thus the motivation for binary distributions.

    So why not create a binary distribution of Python? For one thing, as Aaron mentions, modules would need to be recompiled for each new version of the Python bytecode. But this would be similar to rebuilding a C app to link with a newer version of a dynamic library — Python modules are analogous in this sense to C libraries.

    The real reason is that Python compilation is very much quicker than C compilation. This is in part, I think, because of the dynamic nature of the language, and also because it's not as thorough of a compilation. This has its tradeoffs: in particular, Python apps run much more slowly than do their C counterparts, because Python has to interpret the compiled bytecode into instructions for the processor, whereas the C app already contains such instructions.

    That all being said, there is a program called py2exe that will take a Python module and distribution and build a precompiled windows executable, including in it the logic of the module and its dependencies, including Python itself. I guess the point of this is to avoid having to coerce people into installing Python on their Windows system just to run your app. Under linux, or I think even OS/X, Python is usually already installed, so precompilation is not really necessary. Linux systems also have super-dandy package managers that will transparently install dependencies such as Python if they are not already installed.