Search code examples
haskellfunctional-programminginterpretertype-systems

Designing the type system of a simple statically typed language (in Haskell)


I've been toying with the idea of writing an interpreter for an imperative, statically typed language to get used to functional programming and Haskell, however I've never really had a clear grammar in mind, which often lead to unsatisfying code and the urge to rewrite everything so I came here for help. How should I go about designing a relatively simple but extensible type system?

I want to support basic primitives like numeric types, booleans, chars, etc. (don't want to dabble with arrays or record structures until I've got the basics in place) and their associated basic operations. My biggest issue is that I don't exactly know how the relation between types and operators should be implemented.

I don't know much about Haskell yet, but the simple solution of defining a bunch of repetitive sum types like

data ArithmeticOperator =
    Plus
    | Min
    | Mul 
    | Div

data LogicalOperator =
    And
    | Or
    | Not

doesn't seem to eloquent to me, since this type division would further propagate into the structures that build upon these types like expressions, and having to pattern match each operator when evaluating expressions seems really tedious and not easily extendable.

So then I thought about defining a flexible type for operators, like

data Operator a b =
    UnaryOperator (a -> b)
    | BinaryOperator (a -> a -> b)

where a stands for the parameters type and b is the return type. Problem with this is that I don't really know how to enforce the types to be only the ones that I intend to support. It looks more concise but I'm not sure if this is "right" either.

Finally, are there any resources that introduce this topic in a beginner friendly way? I wouldn't want to get too deep in the subject, but I'd love to read about.. well, common principles/considerations when it comes to designing a type system.


Solution

  • As per the comments, don't try this as your first interpreter. If you haven't already written an interpreter for the untyped lambda calculus or worked your way through a tutorial, like Write Yourself a Scheme in 48 Hours, do that first.

    Anyway, here's a simple implementation of an interpreter for a statically typed expression language with boolean and number types, a few built-in operators (including one with ad-hoc polymorphism), variables, and let x=... in ... variable bindings, but no lambdas. It illustrates a common approach for designing a typed interpreter but is missing enough that it won't spoil your fun.

    Note: I have intentionally avoided using any intermediate or advanced Haskell features (e.g., the types ExprU and ExprT aren't unified in a single polymorphic type -- no "trees that grow" for us!; I haven't used GADTs to leverage the Haskell type system in typing the target language; etc.). These advanced techniques can lead to safer code -- and are also totally awesome so you definitely want to take a look at them in the future -- but they aren't necessary to get a basic typed interpreter working.

    When writing interpreters, it's a good idea to turn on -Wall -- it'll remind you of which patterns (i.e., expression types) you've forgotten to handle:

    {-# OPTIONS_GHC -Wall #-}
    

    Also, to maintain our sanity, we'll need to use some monads:

    import Control.Monad
    import Control.Monad.Reader
    import Control.Monad.Except
    

    You mentioned in your question that you were struggling with two approaches: dividing up operators by type at the outset versus somehow reflecting the operator types within the Haskell type system. Your intuition about the first approach is spot on -- it's not going to work very well. The second approach is do-able, but you're going to quickly run up against some very advanced Haskell techniques that you probably aren't ready for yet.

    Instead, for our statically typed language, let's start by defining an entirely untyped abstract expression syntax. Note that this is the sort of abstract syntax that might be generated by a parser that's completely ignorant of types:

    -- Untyped expressions
    data ExprU
      = FalseU | TrueU           -- boolean literals
      | NumU Double              -- numeric literal
      | VarU String              -- variable
      | UnU UnOp ExprU           -- unary operator
      | BinU BinOp ExprU ExprU   -- binary operator
      | LetU String ExprU ExprU  -- let x = expr1 in expr2
    data UnOp = NegOp | NotOp
      deriving (Show)
    data BinOp = PlusOp | MulOp | AndOp | OrOp | EqualsOp
      deriving (Show)
    

    Note that we could write an interpreter for this directly. However, the interpreter would have to deal with ill-typed expressions like:

    BinU PlusOp FalseU (NumU 1)   -- False + 1
    

    which would kind of defeat the whole purpose of defining a statically typed language.

    The key insight is that we can take this untyped language and, before interpreting it, actually type check it! There are cool techniques for using the Haskell type system to type check a target language, but it's much easier to just define a separate data type to represent expression types:

    -- Simple expression types
    data Typ
      = BoolT
      | NumT
      deriving (Show, Eq)
    

    For our operators, it'll be convenient to give them "types" as well:

    -- Types of operators
    data BinTyp = BinTyp Typ Typ Typ   -- binary ops: two arg types plus result type
    data UnTyp  = UnTyp Typ Typ        -- unary ops: arg type plus result type
    

    In a language with first-class functions, we'd probably combine these types into a single Haskell Typ that could represent not only "primitive" types like bools and nums but also function types, like Bool -> Bool -> Bool, and so on. However, for this simple language, we'll keep "expression types" and "operator types" separate.

    What do we do with these types? Well, we take our untyped expressions ExprU and type check them by adding type annotations to each expression:

    -- Typed expressions
    data ExprT
      = BoolLit Bool
      | NumLit Double
      | VarT Typ String
      | UnT Typ UnOp ExprT
      | BinT Typ BinOp ExprT ExprT
      | LetT Typ String ExprT ExprT
    

    Here, every constructor (except literals) has a Typ field that gives the type of the associated expression. (Actually, we could have added a Typ field to the literals, too, even though it's redundant.) It'll be helpful to have a helper function to extract the type from an ExprT:

    exprTyp :: ExprT -> Typ
    exprTyp (BoolLit _) = BoolT
    exprTyp (NumLit _) = NumT
    exprTyp (VarT t _) = t
    exprTyp (UnT t _ _) = t
    exprTyp (BinT t _ _ _) = t
    exprTyp (LetT t _ _ _) = t
    

    Type checking will take place in a monad that keeps track of the types of variables (the one thing that we can't immediately figure out by inspecting expressions):

    type TypContext = [(String, Typ)]   -- context of variable types
    type TC = ExceptT Error (Reader TypContext)
    

    For now, we can just use strings for our type errors:

    type Error = String
    

    Our type checker is surprisingly easy to write. I takes an untyped expression ExprU, and adds the appropriate type annotations to make an ExprT:

    tc :: ExprU -> TC ExprT
    

    Creating the "typed version" of literals is easy:

    tc (FalseU) = pure $ BoolLit False
    tc (TrueU) = pure $ BoolLit True
    tc (NumU x) = pure $ NumLit x
    

    In our language, types for variables are easy too. We only permit variables to be used after they've been defined (by a LetU binding -- see below), so their types will always be available in the current context:

    tc (VarU var) = do
      mt <- asks (lookup var)
      case mt of
        Just t -> pure $ VarT t var
        Nothing -> throwError $ "undefined variable " ++ var
    

    Types for unary operators are easy. The only two unary operators are "negate" and "not", and both only apply to precisely one argument type and produce precisely one result type. The unTyp function in the where clause tells us what UnTyp our unary operator has:

    tc (UnU op e) = do
      let UnTyp targ tresult = unTyp op
      e' <- tc e
      let t = exprTyp e'
      when (t /= targ) $ throwError $ "op " ++ show op ++ 
        " expected arg of type " ++ show targ ++ ", got " ++ show t
      pure $ UnT tresult op e'
      where
        unTyp NegOp = UnTyp NumT NumT
        unTyp NotOp = UnTyp BoolT BoolT
    

    For binary operators, EqualsOp is a little complicated. We want to implement some ad hoc polymorphism so it can be applied to both booleans and numerics, though we'll require that that the types match (so False == 1 isn't allowed). So, we'll check the types of the arguments and ensure they match. However, no matter what types the argument are, the type of the BinU EqualsOp _ _ expression will always be boolean, so the typed version will always be BinT BootT EqualsOp _ _:

    tc (BinU EqualsOp e1 e2) = do
      e1' <- tc e1
      e2' <- tc e2
      let t1 = exprTyp e1'
          t2 = exprTyp e2'
      when (t1 /= t2) $ throwError $ "op EqualOp needs to compare equal types"
      pure $ BinT BoolT EqualsOp e1' e2'
    

    The other binary operators are monotyped, so we handle them the same as the (monotyped) unary operators above:

    tc (BinU op e1 e2) = do
      let BinTyp targ1 targ2 tresult = binTyp op
      e1' <- tc e1
      e2' <- tc e2
      let t1 = exprTyp e1'
          t2 = exprTyp e2'
      when (t1 /= targ1) $ throwError $ "op " ++ show op ++
        " expected left arg of type " ++ show targ1 ++ ", got " ++ show t1
      when (t2 /= targ2) $ throwError $ "op " ++ show op ++
        " expected right arg of type " ++ show targ2 ++ ", got " ++ show t2
      pure $ BinT tresult op e1' e2'
      where
        binTyp PlusOp = BinTyp NumT NumT NumT
        binTyp MulOp = BinTyp NumT NumT NumT
        binTyp AndOp = BinTyp BoolT BoolT BoolT
        binTyp OrOp = BinTyp BoolT BoolT BoolT
        binTyp EqualsOp = error "internal error"
    

    You might expect type checking of LetU expressions to be complicated, but it's pretty straightforward. For let x=exp1 in exp2, we just calculate the type of exp1, and then add that type for x to the type context when calculating the type of exp2:

    tc (LetU var e1 e2) = do
      e1' <- tc e1
      let t1 = exprTyp e1'
      e2' <- local ((var,t1):) $ tc e2
      let t2 = exprTyp e2'
      pure $ LetT t2 var e1' e2'
    

    That's all there is to the type checker.

    Once the type checker has been run to create an ExprT with a sound type, we'll want to evaluate it. We'll represent values of expressions as:

    -- Values
    data Value
      = BoolV Bool
      | NumV Double
      deriving (Show)
    

    Evaluation will take place in a monad with a context that assigned values to variables:

    type ValContext = [(String, Value)]   -- context of variable values
    type E = Reader ValContext
    

    Note that we don't need an ExceptT transformer here. It turns out that a type-checked program can't generate run-time errors in our language.

    The evaluator/interpreter:

    eval :: ExprT -> E Value
    

    evaluates literals in the obvious way:

    eval (BoolLit b) = pure $ BoolV b
    eval (NumLit x) = pure $ NumV x
    

    Variable values are looked up in the current context:

    eval (VarT _ var) = do
      mt <- asks (lookup var)
      case mt of
        Just v -> pure $ v
        Nothing -> internalerror
    

    Note that the type-checker has already ensured that variables are only used when they've been defined, so a failed lookup "can't happen". We use the function internalerror (defined below) to satisfy the compiler that all cases have been handled so that we avoid a warning, but internalerror won't be called unless there's a bug in our interpreter.

    Unary operators are interpreted like so:

    eval (UnT _ op e) = run op <$> eval e
      where run NegOp (NumV x) = NumV (-x)
            run NotOp (BoolV b) = BoolV (not b)
            run _ _ = internalerror
    

    Again, because of the type-checker, we can't be applying NegOp to a boolean or NotOp to a number, so the missing cases here (e.g., run NegOp (BoolV b)) can't happen. Unfortunately, this means that we lose some of the benefits of having -Wall turned on -- if we forget to handle an new operator, it'll throw an internalerror at runtime. <insert sad emoji>

    Binary operators are interpreted similarly:

    eval (BinT _ op e1 e2) = run op <$> eval e1 <*> eval e2
      where run EqualsOp (BoolV v1) (BoolV v2) = BoolV $ v1 == v2
            run EqualsOp (NumV v1) (NumV v2) = BoolV $ v1 == v2
            run PlusOp (NumV v1) (NumV v2) = NumV $ v1 + v2
            run MulOp (NumV v1) (NumV v2) = NumV $ v1 * v2
            run AndOp (BoolV v1) (BoolV v2) = BoolV $ v1 && v2
            run OrOp (BoolV v1) (BoolV v2) = BoolV $ v1 || v2
            run _ _ _ = internalerror
    

    Because the Haskell == operator is polymorphic, we could have added an Eq instance to the Value type and replaced the first two cases with:

      where run EqualsOp v1 v2 = BoolV $ v1 == v2
    

    but I wanted to illustrate the fact that EqualsOp (BoolV v1) (NumV v2) is never going to happen in a type-checked program.

    Finally, we handle LetT like so:

    eval (LetT _ var e1 e2) = do
      v1 <- eval e1
      local ((var,v1):) $ eval e2
    

    That's our evaluator/interpreter.

    Something interesting to note. In our definition of eval, we never actually reference the type annotations that tc has added! For this reason, we actually could have written eval to interpret the original untyped ExprU because the fact that tc completed would be enough to ensure that the ExprU could be interpreted without runtime type errors. That is, in this simple language, the fact that a program type checks is way more important than the types that are calculated during type checking. In more complicated languages, the type annotations might be more useful.

    Anyway, that's it. Here's full code plus an example program expr1 in the target language:

    {-# OPTIONS_GHC -Wall #-}
    
    import Control.Monad
    import Control.Monad.Reader
    import Control.Monad.Except
    
    -- Untyped expressions
    data ExprU
      = FalseU | TrueU           -- boolean literals
      | NumU Double              -- numeric literal
      | VarU String              -- variable
      | UnU UnOp ExprU           -- unary operator
      | BinU BinOp ExprU ExprU   -- binary operator
      | LetU String ExprU ExprU  -- let x = expr1 in expr2
    data UnOp = NegOp | NotOp
      deriving (Show)
    data BinOp = PlusOp | MulOp | AndOp | OrOp | EqualsOp
      deriving (Show)
    
    -- Simple expression types
    data Typ
      = BoolT
      | NumT
      deriving (Show, Eq)
    
    -- Types of operators
    data BinTyp = BinTyp Typ Typ Typ
    data UnTyp  = UnTyp Typ Typ
    
    -- Typed expressions
    data ExprT
      = BoolLit Bool
      | NumLit Double
      | VarT Typ String
      | UnT Typ UnOp ExprT
      | BinT Typ BinOp ExprT ExprT
      | LetT Typ String ExprT ExprT
    
    exprTyp :: ExprT -> Typ
    exprTyp (BoolLit _) = BoolT
    exprTyp (NumLit _) = NumT
    exprTyp (VarT t _) = t
    exprTyp (UnT t _ _) = t
    exprTyp (BinT t _ _ _) = t
    exprTyp (LetT t _ _ _) = t
    
    -- Type check an expression
    type Error = String
    type TypContext = [(String, Typ)]   -- context of variable types
    type TC = ExceptT Error (Reader TypContext)
    runTC :: TC a -> a
    runTC act = case runReader (runExceptT act) [] of
      Left err -> error err
      Right a -> a
    
    tc :: ExprU -> TC ExprT
    tc (FalseU) = pure $ BoolLit False
    tc (TrueU) = pure $ BoolLit True
    tc (NumU x) = pure $ NumLit x
    tc (VarU var) = do
      mt <- asks (lookup var)
      case mt of
        Just t -> pure $ VarT t var
        Nothing -> throwError $ "undefined variable " ++ var
    tc (UnU op e) = do
      let UnTyp targ tresult = unTyp op
      e' <- tc e
      let t = exprTyp e'
      when (t /= targ) $ throwError $ "op " ++ show op ++
        " expected arg of type " ++ show targ ++ ", got " ++ show t
      pure $ UnT tresult op e'
      where
        unTyp NegOp = UnTyp NumT NumT
        unTyp NotOp = UnTyp BoolT BoolT
    tc (BinU EqualsOp e1 e2) = do
      e1' <- tc e1
      e2' <- tc e2
      let t1 = exprTyp e1'
          t2 = exprTyp e2'
      when (t1 /= t2) $ throwError $ "op EqualOp needs to compare equal types"
      pure $ BinT BoolT EqualsOp e1' e2'
    tc (BinU op e1 e2) = do
      let BinTyp targ1 targ2 tresult = binTyp op
      e1' <- tc e1
      e2' <- tc e2
      let t1 = exprTyp e1'
          t2 = exprTyp e2'
      when (t1 /= targ1) $ throwError $ "op " ++ show op ++
        " expected left arg of type " ++ show targ1 ++ ", got " ++ show t1
      when (t2 /= targ2) $ throwError $ "op " ++ show op ++
        " expected right arg of type " ++ show targ2 ++ ", got " ++ show t2
      pure $ BinT tresult op e1' e2'
      where
        binTyp PlusOp = BinTyp NumT NumT NumT
        binTyp MulOp = BinTyp NumT NumT NumT
        binTyp AndOp = BinTyp BoolT BoolT BoolT
        binTyp OrOp = BinTyp BoolT BoolT BoolT
        binTyp EqualsOp = error "internal error"
    tc (LetU var e1 e2) = do
      e1' <- tc e1
      let t1 = exprTyp e1'
      e2' <- local ((var,t1):) $ tc e2
      let t2 = exprTyp e2'
      pure $ LetT t2 var e1' e2'
    
    -- Evaluate a typed expression
    internalerror :: a
    internalerror = error "can't happen, internal error in type checker"
    
    -- Values
    data Value
      = BoolV Bool
      | NumV Double
      deriving (Show)
    
    type ValContext = [(String, Value)]   -- context of variable values
    type E = Reader ValContext
    runE :: E a -> a
    runE act = runReader act []
    
    eval :: ExprT -> E Value
    eval (BoolLit b) = pure $ BoolV b
    eval (NumLit x) = pure $ NumV x
    eval (VarT _ var) = do
      mt <- asks (lookup var)
      case mt of
        Just v -> pure $ v
        Nothing -> internalerror
    eval (UnT _ op e) = run op <$> eval e
      where run NegOp (NumV x) = NumV (-x)
            run NotOp (BoolV b) = BoolV (not b)
            run _ _ = internalerror
    eval (BinT _ op e1 e2) = run op <$> eval e1 <*> eval e2
      where run EqualsOp (BoolV v1) (BoolV v2) = BoolV $ v1 == v2
            run EqualsOp (NumV v1) (NumV v2) = BoolV $ v1 == v2
            run PlusOp (NumV v1) (NumV v2) = NumV $ v1 + v2
            run MulOp (NumV v1) (NumV v2) = NumV $ v1 * v2
            run AndOp (BoolV v1) (BoolV v2) = BoolV $ v1 && v2
            run OrOp (BoolV v1) (BoolV v2) = BoolV $ v1 || v2
            run _ _ _ = internalerror
    eval (LetT _ var e1 e2) = do
      v1 <- eval e1
      local ((var,v1):) $ eval e2
    
    expr1 :: ExprU
    expr1 = LetU "x" (BinU PlusOp (NumU 2) (NumU 3)) (LetU "y" (BinU MulOp (VarU "x") (NumU 5)) (BinU EqualsOp (VarU "y") (NumU 25)))
    
    val1 :: Value
    val1 = let e1' = runTC (tc expr1) in runE (eval e1')
    
    main :: IO ()
    main = do
      print $ val1