Search code examples
pythonpypylanguage-implementation

PyPy -- How can it possibly beat CPython?


From the Google Open Source Blog:

PyPy is a reimplementation of Python in Python, using advanced techniques to try to attain better performance than CPython. Many years of hard work have finally paid off. Our speed results often beat CPython, ranging from being slightly slower, to speedups of up to 2x on real application code, to speedups of up to 10x on small benchmarks.

How is this possible? Which Python implementation was used to implement PyPy? CPython? And what are the chances of a PyPyPy or PyPyPyPy beating their score?

(On a related note... why would anyone try something like this?)


Solution

  • Q1. How is this possible?

    Manual memory management (which is what CPython does with its counting) can be slower than automatic management in some cases.

    Limitations in the implementation of the CPython interpreter preclude certain optimisations that PyPy can do (eg. fine grained locks).

    As Marcelo mentioned, the JIT. Being able to on the fly confirm the type of an object can save you the need to do multiple pointer dereferences to finally arrive at the method you want to call.

    Q2. Which Python implementation was used to implement PyPy?

    The PyPy interpreter is implemented in RPython which is a statically typed subset of Python (the language and not the CPython interpreter). - Refer https://pypy.readthedocs.org/en/latest/architecture.html for details.

    Q3. And what are the chances of a PyPyPy or PyPyPyPy beating their score?

    That would depend on the implementation of these hypothetical interpreters. If one of them for example took the source, did some kind of analysis on it and converted it directly into tight target specific assembly code after running for a while, I imagine it would be quite faster than CPython.

    Update: Recently, on a carefully crafted example, PyPy outperformed a similar C program compiled with gcc -O3. It's a contrived case but does exhibit some ideas.

    Q4. Why would anyone try something like this?

    From the official site. https://pypy.readthedocs.org/en/latest/architecture.html#mission-statement

    We aim to provide:

    • a common translation and support framework for producing
      implementations of dynamic languages, emphasizing a clean
      separation between language specification and implementation
      aspects. We call this the RPython toolchain_.

    • a compliant, flexible and fast implementation of the Python_ Language which uses the above toolchain to enable new advanced high-level features without having to encode the low-level details.

    By separating concerns in this way, our implementation of Python - and other dynamic languages - is able to automatically generate a Just-in-Time compiler for any dynamic language. It also allows a mix-and-match approach to implementation decisions, including many that have historically been outside of a user's control, such as target platform, memory and threading models, garbage collection strategies, and optimizations applied, including whether or not to have a JIT in the first place.

    The C compiler gcc is implemented in C, The Haskell compiler GHC is written in Haskell. Do you have any reason for the Python interpreter/compiler to not be written in Python?