Search code examples
performancehaskellghchigher-order-functionslambda-calculus

Why is Haskell (GHC) so darn fast?


Haskell (with the GHC compiler) is a lot faster than you'd expect. Used correctly, it can get close-ish to low-level languages. (A favorite thing for Haskellers to do is to try and get within 5% of C (or even beat it, but that means you are using an inefficient C program, since GHC compiles Haskell to C).) My question is, why?

Haskell is declarative and based on lambda calculus. Machine architectures are clearly imperative, being based on turing machines, roughly. Indeed, Haskell doesn't even have a specific evaluation order. Also, instead of dealing with machine data types, you make algebraic data types all the time.

Weirdest of all though is higher order functions. You would think that creating functions on the fly, and throwing them around, would make a program slower. But using higher order functions actually makes Haskell faster. Indeed, it seems that, to optimize Haskell code, you need to make it more elegant and abstract instead of more machine-like. None of Haskell's more advanced features seem to even affect its performance, if they don't improve it.

Sorry if this is sounding ranty, but here is my question: Why is Haskell (compiled with GHC) so fast, considering its abstract nature and differences from physical machines?

Note: The reason I say C and other imperative languages are somewhat similar to Turing Machines (but not to the extent that Haskell is similar to Lambda Calculus) is that in an imperative language, you have a finite number of states (a.k.a. line number), along with a Tape (the ram), such that the state and the current tape determine what to do to the tape. See the Wikipedia entry, Turing machine equivalents, for the transition from Turing Machines to computers.


Solution

  • For a long time it was thought that functional languages couldn't be fast -- and especially lazy functional languages. But this was because their early implementations were, in essence, interpreted and not genuinely compiled.

    A second wave of designs emerged based on graph reduction, and opened up the possibility for much more efficient compilation. Simon Peyton Jones wrote about this research in his two books The Implementation of Functional Programming Languages and Implementing functional languages: a tutorial (the former with sections by Wadler and Hancock, and the latter written with David Lester). (Lennart Augustsson also informed me that one key motivation for the former book was describing the way that his LML compiler, which was not extensively commented, accomplished its compilation).

    The key notion behind graph reduction approaches such as described in these works is that we do not think of a program as a sequence of instructions, but of a dependency graph which is evaluated through a series of local reductions. The second key insight is that the evaluation of such a graph need not be interpreted but instead the graph itself can be built of code. In particular, we can represent a node of a graph not as "either a value or an 'opcode' and the values to operate on" but instead as a function that when invoked, returns the desired value. The first time it is invoked, it asks the subnodes for their values and then operates on them, and then it overwrites itself with a new instruction that just says "return the result."

    This is described in a later paper that lays the basics out for how GHC still works today (though modulo many various tweaks): "Implementing Lazy Functional Languages on Stock Hardware: The Spineless Tagless G-Machine.". The current execution model for GHC is documented in more detail at the GHC Wiki.

    So the insight is that the strict distinction of "data" and "code" which we think of as "fundamental" to how machines work is not how they must work, but is imposed by our compilers. So we can throw that out, and have code (a compiler) that generates self-modifying code (the executable) and it can all work quite nicely.

    Thus it turns out that while machine architectures are imperative in a certain sense, languages may map to them in very surprising ways that don't look like conventional C-style flow control, and if we think low-level enough, this may also be efficient.

    On top of this there are many other optimizations opened up by purity in particular, as it allows a greater range of "safe" transformations. When and how to apply these transformations such that they make things better and not worse is of course an empirical question, and on this and many other small choices, years of work have been put into both theoretical work and practical benchmarking. So this of course plays a role as well. A paper that provides a good example of this sort of research is "Making a Fast Curry: Push/Enter vs. Eval/Apply for Higher-Order Languages."

    Finally, it should be noted that this model still introduces an overhead due to indirections. This can be avoided in cases where we know that it is "safe" to do things strictly and thus elide graph indirections. The mechanisms that infer strictness/demand are again documented in some detail at the GHC Wiki.