Search code examples
parsinghaskellparsec

Custom ADT vs. Tree for parser return value


I'm using Parsec to build a simple Lisp parser.

What are the (dis)advantages of using a custom ADT for the parser types versus using a standard Tree (i.e. Data.Tree)?

After trying both ways, I've come up with a couple points for custom ADTs (i.e. Parser ASTNode):

  • seems to be much clearer and simpler
  • others have done it this way(including Tiger, which is/was bundled with Parsec)

and one against (i.e. Parser (Tree ASTNode):

  • Data.Tree already has Functor, Monad, etc. instances, which will be very helpful for semantic analysis, evaluation, calculating code statistics

For example:

  1. custom ADT

    import Text.ParserCombinators.Parsec
    
    
    data ASTNode 
      = Application ASTNode [ASTNode]
      | Symbol String
      | Number Float
      deriving (Show)
    
    
    int :: Parser ASTNode
    int = many1 digit >>= (return . Number . read)
    
    
    symbol :: Parser ASTNode
    symbol = many1 (oneOf ['a'..'z']) >>= (return . Symbol)
    
    
    whitespace :: Parser String
    whitespace = many1 (oneOf " \t\n\r\f")
    
    
    app :: Parser ASTNode
    app =
      char '(' >>
      sepBy1 expr whitespace >>= (\(e:es) ->
      char ')' >>
      (return $ Application e es))
    
    
    expr :: Parser ASTNode
    expr =  symbol  <|>  int  <|>  app
    

    example use:

    ghci> parse expr "" "(a 12 (b 13))"
    Right 
      (Application 
        (Symbol "a") 
        [Number 12.0, Application 
                        (Symbol "b") 
                        [Number 13.0]])
    
  2. Data.Tree

    import Text.ParserCombinators.Parsec
    import Data.Tree
    
    
    data ASTNode 
      = Application (Tree ASTNode)
      | Symbol String
      | Number Float
      deriving (Show)
    
    
    int :: Parser (Tree ASTNode)
    int = many1 digit >>= (\x -> return $ Node (Number $ read x) [])
    
    
    symbol :: Parser (Tree ASTNode)
    symbol = many1 (oneOf ['a' .. 'z']) >>= (\x -> return $ Node (Symbol x) [])
    
    
    whitespace :: Parser String
    whitespace = many1 (oneOf " \t\n\r\f")
    
    
    app :: Parser (Tree ASTNode)
    app =
      char '(' >>
      sepBy1 expr whitespace >>= (\(e:es) ->
      char ')' >>
      (return $ Node (Application e) es))
    
    
    expr :: Parser (Tree ASTNode)
    expr =  symbol  <|>  int  <|>  app
    

    and example use:

    ghci> parse expr "" "(a 12 (b 13))"
    Right
     (Node
       (Application 
         (Node (Symbol "a") []))
       [Node (Number 12.0) [],
        Node 
          (Application 
            (Node (Symbol "b") []))
          [Node (Number 13.0) []]])
    

    (sorry for the formatting -- hopefully it's clear)


Solution

  • I'd absolutely go for the AST, because interpretation/compilation/language analysis in general is very much driven by the structure of your language. The AST will simply and naturally represent and respect that structure, while Tree will do neither.

    For example, a common form of language implementation technique is to implement some complex features by translation: translate programs that involve those features or constructs into programs in a subset of the a language that does not use them (Lisp macros, for example, are all about this). If you use an AST, the type system will, for example, often forbid you from producing illegal translations as output. Whereas a Tree type that doesn't understand your program will not help there.

    Your AST doesn't look very complicated, so writing utility functions for it should not be hard. Take this one for example:

    foldASTNode :: (r -> [r] -> r) -> (String -> r) -> (Float -> r) -> r
    foldASTNode app sym num node = 
        case node of
          Application f args -> app (subfold f) (map subfold args)
          Symbol str         -> sym str
          Number n           -> num n
        where subfold = foldASTNode app sym num
    

    And in any case, what sort of Functor do you wish to have on your AST? There's no type parameter on it...