Search code examples
haskellcompilationdecompiling

Generalizing/compiling haskell code into a lambda


I am pretty much 90% sure that the title of this question is wrong however I have no idea what the right title would be (I will gladly edit the title if suggestions come along!).

When reading up on Haskell and the core principles of the language you always find that it is a language "based on lambda expressions". I remember reading somewhere that this means that at the end, the main function just gets "proprocessed" into one big lambda, everything gets inlined, basically your entire code becomes one single, huge, lambda expression.

My questions are:

  1. Is what I said above true?

  2. If the answer to question 1 is "yes", is there any... decompiler/partial compiler/preprocessor? I know about this that lets you see the assembly code behind languages like C/++ and Haskell but is there anything I could use to explore the generated lambda expression?

This question is asked from a purely educational standpoint and not intended to seek a solution to a particular problem. I simply wish to learn more about a language I find extremely fascinating.


Solution

  • Let's make a distinction between the semantics of Haskell and the implementation of GHC. Mostly because we use different terms for language semantics than for assembly, but also because some other compiler might do things differently than GHC.

    Every Haskell program defines main, which is an expression of type IO (). I don't like to call it a "lambda expression" because the type shows that it's not a function. The definition of main is some nested tree of function calls. Even the sequential lines in a do block are defined as calls to the functions (>>) and (>>=).

    GHC uses heuristics to decide what to inline, to get the best runtime performance. It will usually inline small expressions that aren't recursive. I believe the runtime system maintains a callstack of functions currently being evaluated, not unlike the runtime result of compiling function calls in C or other imperative languages.

    GHC provides many options for printing intermediate stages of compilation. I'm not sure which you will find interesting. Core is the lowest-level representation that feels like Haskell. Cmm (also called C--) is the highest-level representation that feels like assembly.