Search code examples
haskellgadt

Converting an untyped AST for a simple typed language into a GADT


I have an ADT representing the AST for a simple language:

data UTerm = UTrue
      | UFalse
      | UIf UTerm UTerm UTerm
      | UZero
      | USucc UTerm
      | UIsZero UTerm

This data structure can represent invalid terms that don't follow the type rules of the language, like UIsZero UFalse, so I'd like to use a GADT that enforces well-typedness:

{-# LANGUAGE GADTs #-}

data TTerm a where
  TTrue :: TTerm Bool
  TFalse :: TTerm Bool
  TIf :: TTerm Bool -> TTerm a -> TTerm a -> TTerm a
  TZero :: TTerm Int
  TSucc :: TTerm Int -> TTerm Int
  TIsZero :: TTerm Int -> TTerm Bool

My problem is to type check a UTerm and convert it into a TTerm. My first thought was UTerm -> Maybe (TTerm a), but this of course doesn't work because it's not valid for all as. I don't even know what the type would be, because we don't know if a is going to be Int or Bool. Then I thought I could write a different type checking function for each of the possible values of a:

import Control.Applicative

typecheckbool :: UTerm -> Maybe (TTerm Bool)
typecheckbool UTrue = Just TTrue
typecheckbool UFalse = Just TFalse
typecheckbool (UIsZero a) = TIsZero <$> typecheckint a
typecheckbool _ = Nothing

typecheckint :: UTerm -> Maybe (TTerm Int)
typecheckint UZero = Just TZero
typecheckint (USucc a) = TSucc <$> typecheckint a
typecheckint (UIf a b c) = TIf <$> typecheckbool a <*> typecheckint b <*> typecheckint c
typecheckint UTrue = Nothing
typecheckint UFalse = Nothing
typecheckint (UIsZero _) = Nothing

This works for some cases, for a subset of the language where TIf requires its consequent and alternative are Ints (But TIf TTrue TFalse TTrue is actually totally valid), and where we know the target type of the expression we're typing.

What's the right way to convert from a UTerm to a TTerm?


Solution

  • The standard technique is to define an existential type:

    data ETerm_ where
        ETerm_ :: TTerm a -> ETerm
    

    In this case, you may also want some term-level evidence of which type you have; e.g.

    data Type a where
        TInt :: Type Int
        TBool :: Type Bool
    

    then the real ETerm would look like this:

    data ETerm where
        ETerm :: Type a -> TTerm a -> ETerm
    

    The interesting case of type checking is then something like

    typeCheck (UIf ucond ut uf) = do
        ETerm TBool tcond <- typeCheck ucond
        ETerm tyt tt <- typeCheck ut
        ETerm tyf tf <- typeCheck uf
        case (tyt, tyf) of
            (TBool, TBool) -> return (ETerm TBool (TIf tcond tt tf))
            (TInt , TInt ) -> return (ETerm TInt  (TIf tcond tt tf))
            _ -> fail "branches have different types"