Search code examples
haskelltypescompiler-constructionghc

GHC internals: is there C implementation of the type system?


I'm looking into internals of GHC and I find all the parsing and type system written completely in Haskell. Low-level core of the language is provided by RTS. The question is which one of the following is true?

  • RTS contains C implementation of the type system and other basic parts of Haskell (I didn't find it, RTS is mainly GC and threading)
  • Everything is implemented in Haskell itself. But it seems quite tricky because building GHC already requires GHC.

Could you explain development logic of the compiler? For example Python internals provide an opaque implementation of everything in C.


Solution

  • As others have noted in the comments, GHC is written almost entirely in Haskell (plus select GHC extensions) and is intended to be compiled with itself. In fact, the only program in the world that can compile the GHC compiler is the GHC compiler! In particular, parsing and type inference are implemented in Haskell code, and you won't find a C implementation hidden in there anywhere.

    The best source for understanding the internal structure of the compiler (and what's implemented how) is the GHC Developer Wiki and specifically the "GHC Commentary" link. If you have a fair bit of spare time, the video series from the Portland 2006 GHC Hackathon is absolutely fascinating.

    Note that the idea of a compiler being written in the language it compiles is not unusual. Many compilers are "self-hosting" meaning that they are written in the language they compile and are intended to compile themselves. See, for example, this question on another Stack Exchange sister site: Why are self-hosting compilers considered a rite of passage for new languages?, or simply Google for "self-hosting compiler"

    As you say, this is "tricky", because you need a way to get the process started. Some approaches are:

    • You can write the first compiler in a different language that already has a compiler (or write it in assembly language); then, once you have a running compiler, you can port it to the same language it compiles. According to this Quora answer, the first C compiler was written this way. It was written in "NewB" whose compiler was written in "B", a self-hosting compiler that had originally been written in assembly and then rewritten in itself.

    • If the language is popular enough to have another compiler, write the compiler in its own language and compile it in phases, first with the other compiler, then with itself (as compiled by the other compiler), then again with itself (as compiled by itself). The last two compiler executables can be compared as a sort of massive test that the compiler is correct. The Gnu C Compiler can be compiled this way (and this certainly used to be the standard way to install it from source, using the vendor's [inferior!] C compiler to get started).

    • If an interpreter written in another language already exists or is easy to write, the compiler can be run by the interpreter to compile its own source code, and thereafter the compiled compiler can be used to compile itself. The first LISP compiler is claimed to be the first compiler to bootstrap itself this way.

    The bootstrapping process can often be simplified by writing the compiler (at least initially) in a restricted core of the language, even though the compiler itself is capable of compiling the full language. Then, a sub-par existing compiler or a simplified bootstrapping compiler or interpreter can get the process started.

    According to the Wikipedia entry for GHC, the original GHC compiler was written in 1989 in Lazy ML, then rewritten in Haskell later the same year. These days, new versions of GHC with all their shiny new features are compiled on older versions of GHC.

    The situation for the Python interpreter is a little different. An interpreter can be written in the language it interprets, of course, and there are many examples in the Lisp world of writing Lisp interpreters in Lisp (for fun, or in developing a new Lisp dialect, or because you're inventing Lisp), but it can't be interpreters all the way down, so eventually you'd need either a compiler or an interpreter implemented in another language. As a result, most interpreters aren't self-hosting: the mainstream interpreters for Python, Ruby, and PHP are written in C. (Though, PyPy is an alternate implementation of the Python interpreter that's written in Python, so...)