Search code examples
parsinghaskellparsec

Good type design in Haskell for the AST of a simple language


I'm new to Haskell, and am working through the Haskell LLVM tutorial. In it, the author defines a simple algebraic data type to represent the AST.

type Name = String

data Expr
  = Float Double
  | BinOp Op Expr Expr
  | Var String
  | Call Name [Expr]
  | Function Name [Expr] Expr
  | Extern Name [Expr]
  deriving (Eq, Ord, Show)

data Op
  = Plus
  | Minus
  | Times
  | Divide
  deriving (Eq, Ord, Show)

However, this is not an ideal structure, because the parser actually expects that the list of Expr in an Extern will only ever contain expressions representing variables (i.e. parameters in this situation cannot be arbitrary expressions). I would like to make the types reflect this constraint (making it easier to generate random valid ASTs using QuickCheck); however, for the sake of consistency in the parser functions (which all have type Parser Expr), I don't just want to say | Expr Name [Name]. I would like to do something like this:

data Expr
  = ...
  | Var String
    ...
  | Function Name [Expr] Expr
  | Extern Name [Var] -- enforce constraint here
  deriving (Eq, Ord, Show)

But it's not possible in Haskell.

To summarize, Extern and Var should both be Expr, and Extern should have a list of Vars representing parameters. Would the best way be to split all of these out and make them instances of an Expr typeclass (that wouldn't have any methods)? Or is there a more idiomatic method (or would it be better to scrap these types and do something totally different)?


Solution

  • Disclaimer, I'm the author of the LLVM tutorial you mentioned.

    Just use Extern Name [Name], everything after Chapter 3 onward in the tutorial uses that exact definition anyways. I think I just forgot to make Chapter 2 Syntax.hs consistent with the others.

    I wouldn't worry about making the parser definitions consistent, it's fine for them to return different types. Here's what the later parsers use. identifier is just the parsec builtin for the alphanumeric identifier from the LanguageDef that becomes the Name type in the AST.

    extern :: Parser Expr
    extern = do
      reserved "extern"
      name <- identifier
      args <- parens $ many identifier
      return $ Extern name args