Search code examples
parsingabstract-syntax-treestate-machine

LL top-down parser, from CST to AST


I am currently learning about syntax analysis, and more especially, top-down parsing.

I know the terminology and the difference with bottom-up LR parsers, and since top-down LL parsers are easier to implement by hand, I am looking forward to make my own.

I have seen two kinds of approach:

  • The recursive-descent one using a collection of recursive functions.
  • The stack-based and table-driven automaton as shown here on Wikipedia.

I am more interested by the latter, for its power and its elimination of call-stack recursion. However, I don't understand how to build the AST from the implicit parse tree.

This code example of a stack-based finite automaton show the parser analyzing the input buffer, only giving a yes/no answer if the syntax has been accepted.

I have heard of stack annotations in order to build the AST, but I can't figure out how to implement them. Can someone provide a practical implementation of such technique ?


Solution

  • You need to understand the concept behind. You need to understand the concept of pushdown automaton. After you understand how to make computation on paper with pencil you will be able to understand multiple ways to implement its idea, via recursive descent or with stack. The ideas are the same, when you use recursive descent you implicitly have the stack that the program use for execution, where the execution data is combined with the parsing automaton data.

    I suggest you to start with the course taught by Ullman (automata) or Dick Grune, this one is the best focused on parsing. (the book of Grune is this one), look for the 2nd edition.

    For LR parsing the essential is to understand the ideas of Earley, from these ideas Don Knuth created the LR method.

    For LL parsing, the book of Grune is excellent, and Ullman presents the computation on paper, the math background of parsing that is essential to know if you want to implement your own parsers.

    Concerning the AST, this is the output of parsing. A parser will generate a parsing tree that is transformed in AST or can generate and output directly the AST.