Search code examples
rubyhaskellcompiler-constructionjittypechecking

When / where type checking occurs in the compilation process


Wondering at a high level when type check typically occurs (textbook vs. in practice) in the compilation process (at a high level). Roughly my understanding of the compilation process is:

  1. Parse the source code into an AST
  2. Convert the AST into an Intermediate Representation IR
  3. Optimize the IR (i.e. SSA Form, Register Allocation, etc.)
  4. Simplify the IR
  5. Generate final output code

Wondering if the typechecking occurs in between (1) and (2), (2) and (3), or after (4), or if it occurs sprinkled throughout the whole process, or something else. I'd be interested to know the answer for object oriented, functional, and logic programming (in that order of priority), but if I had to pick one then OO such as a dynamically typed language like Ruby, or statically typed functional language like Haskell.


Solution

  • Static type checking is usually performed on the AST, so it either happens between 1 and 2 or as part of 2 (meaning that the IR-generator invokes functions from the type checker whenever it processes an AST-node - of course the IR generator and the type checker should still live in different modules/files).

    In theory, you could perform type checking on the IR, but that will usually lead to at least one of the following problems:

    1. The IR does not contain enough information to still capture all of the errors that you want.
    2. The IR does not contain enough information to produce the best error messages in all cases. As an example of this, consider that the IR represents array accesses and pointer arithmetic using the same instructions. Now you want to produce an error for an array access with a floating point index. If the message is "Floating point values not allowed as operands to pointer arithmetic" that would be confusing when the code doesn't contain any pointer arithmetic. Requiring the user to know that array accesses are represented as pointer arithmetic to make sense of the error message, would not be very user friendly.
    3. You add a large amount of extra information to the IR for the purposes of type checking, making the IR more complicated and unwieldy, but all you've gained from that is the ability to process the IR in the same way that you would the AST, without gaining any benefits.

    Usually working on the IR instead of the AST means that you don't have to handle as many cases (exactly because the IR represents different things using the same instructions). That's the main benefit. But if you then jump through extra hoops just to be able to treat the cases differently again, you might just as well use the AST in the first place.

    So type checking on the AST¹ is usually preferred. GHC (the main Haskell compiler) performs type checking the AST.

    ¹ Or at least something very close to the AST - there might, for example, be a representation between the AST and the final IR, which simplifies things in some ways (such as removing flattening nested expressions), without losing information relevant to type checking.


    Dynamic type checking happens at run time. The code that performs these dynamic type checks is either part of the interpreter (if there is an interpreter) or inserted by the code generator.

    Ruby performs type checking in the interpreter.