python performance benchmarking mojolang

Performance Comparison - Mojo vs Python

Mojo, a programming language, claims to be 65000x faster than python. I am eager to understand if is there any concrete benchmark data that supports this claim? Also, how does it differ in real world problems?

I am primarily encountered this claim on their website and have watched several videos discussing Mojo's speed. However I am seeking concrete benchmark data that substantiates this assertion.

Solution

TL;DR: The claim directly comes from a blog on the Mojo website. The benchmark is a computation of the Mandelbrot set. It is not a rigorous benchmark nor one representative of most Python applications. It is also clearly biased (e.g. sequential VS parallel codes).

They choose it because it has the following properties: "Simple to express", "Pure compute" (ie. compute-bound), "Irregular computation", "Embarrassingly parallel", "Vectorizable". They also states "Mandelbrot is a nice problem that demonstrates the capabilities of Mojo". This is the kind of problem where Python does not shine (and it not meant to be used for), but on which Mojo shine well (pretty optimal use-case for it). Thus, the speed-up can be pretty huge. In fact, this benchmark is rather a maximum speed-up you can get and not an average of many real-world applications.

First things first, it means nothing to compare language performance. We compare implementations. CPython is the standard Python implementation, but not the only one. CPython is an interpreter so a code is very slow when a it is fully interpreted. Optimized Python codes tends not to run much interpreted code but vectorized ones (eg. the script mostly calls Numpy optimized functions written in C).

PyPy is an alternative implementation which uses a JIT-compiler to run code faster. It claims 4.8x faster performance compared to CPython with a detailed set of benchmarks (geometric average). There are some benchmark that are very hard to make faster, even using a native language. Symbolic and large string computations are hard to make faster (CPython string are already well optimized in C). It tends to be faster for numerical codes.

In the Mojo benchmark, the naive Python codes, Numpy ones and PyPy ones are sequential while the final Mojo code is multi-threaded. This is not a fair comparison. One should have to use at least the multiprocessing module to compare parallel codes together. This is a critical point since they run the code on a 88-Core Intel Xeon CPU. Indeed, since the computation is compute-bound, one can expect a speed up close to 88 using multiple threads. In fact, their parallel Mojo implementation is 85 times faster than their sequential Mojo one. Without a parallel Python implementation, It would be more fair to claim that Mojo is 874 times faster than a naive CPython implementation, 175 times faster than a (rather naive) Numpy code, and 40 times faster than a PyPy implementation (on this specific Mandelbrot set computation).

In sequential, most of the speed-up comes from the use of SIMD instructions and instruction-level parallelism. The Python implementations tends not to use them. While Numpy can do that, not all functions are well vectorized (the one operating on Complex numbers tends not to be AFAIK) and Numpy codes tends to be memory-bound due to the creation of many large temporary arrays.

Note that tools like Numba and Cython are not shown in the benchmark while they are frequently used to speed up numerical codes. It would be more fair to add them (or at least mention them).