Search code examples
parsingdata-structurescompiler-constructionabstract-syntax-treebnf

AST for this Mini-Language


I'm having trouble figuring, deciding on how will the Abstract Syntax Tree result in memory, will it be a forest of trees for each statement?, or will it be a single rooted binary tree?.

Sample source:

P: 10
if A < 15:
    P: 9

Here's the BNF-Grammar :

<Prog>       ::= <Stmts>
<Stmts>      ::= <Stmt> | <Stmt> <Stmt>
<Stmt>       ::= <IfStmt> NL | <AssignStmt> NL
<AssignStmt> ::= <Id> : <Aexp> | <Indents> <AssignStmt>
<IfStmt>     ::= if <Lexp> : NL <Stmts> | <Indents> <IfStmt>
<Aexp>       ::= <Id> | <Int> | <Aexp> <AOP> <Aexp>
<Lexp>       ::= <Aexp> <LOP> <Aexp>
<LOP>        ::= < | > | & 
<AOP>        ::= + | - | * | /
<Int>        ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | <Int> <Int>
<Id>         ::= A | B | C | D | E | F | P
<Indents>    ::= SPC | SPC <Indents>

Where SPC represents white-space and NL the newline character. Yes it only allows 7 identifiers. And positive integers.

It is easily lexed, however I've searched lot but most of the AST examples only use mathematical expressions which are quit easy to grasp. If you find that my grammar is incorrect please say so. Also note that the syntax is inspired in Python I've read the Lexical Analysis doc for it but it doesn't even mention the word tree.

Thanks in advance.


Solution

  • Given the fact that there may be multiple "statements" in the program and under each "if statement", you can arrange statements as lists/arrays in memory. If you really want to use trees, you can do that, but those trees will only formally be trees as they will be degenerated and will look and function as lists. Think about it, every statement has practically no relation to its neighbor statements other than the order in which they appear and execute. They do not form a recursive structure. P: 10 and if A < 15: don't have any recursive relationship with each other.

    It doesn't seem there's a good reason for or a clear advantage of using trees to represent statements. You may choose to use trees to have a single uniform data structure, however.

    As for expressions, they fit into the tree idea nicely since many operators are binary, they take one or two inputs and produce an output, which can in turn be used as an input to some other operator. There's a clear recursion here.

    I think it would be practical to arrange the entire program as a list of sublists (for statements) and trees (for expressions). But you can use degenerated trees instead of (sub)lists.