Search code examples
compiler-constructionhaskellgrammarrepresentation

Haskell - How to best to represent a programming language's grammar?


I've been looking at Haskell and I'd quite like to write a compiler in it (as a learning exercise), since a lot of its innate features can be readily applied to a compiler (particularly a recursive descent compiler).

What I can't quite get my head around is how to represent a language's grammar in a Haskell-ian way. My first thought was to use recursive data type definitions, but I can't see how I use them to match against keywords in the language ("if") for example.

Thoughts and suggestions greatly appreciated,

Pete


Solution

  • A recursive data type is fine for this. For example, given the language:

    expr ::= var
          |  "true"
          |  "false"
          |  "if" expr "then" expr "else" expr
          |  "(" expr ")"
    

    an example expression in this language would be:

    if true then x else (if false then y else true)
    

    Your Haskell data type would look something like this:

    data Expr = Var String
              | Lit Bool
              | If Expr Expr Expr
    

    Your parser then takes care to translate, e.g., x into Var "x", and true into Lit True, etc. I.e.:

    parse "if x then false else true" 
      ==  If (Var "x") (Lit False) (Lit True)
    

    For writing parsers you can roll your own using the techniques mentioned in Norman's answer, or using Parsec or use parser generators like Happy.