Search code examples
c++compiler-constructionabstract-syntax-treelanguage-theory

Confusion over the abstract formatting of an AST declaration of functions


I'm working on the implementation of a programming language in C++, I'm getting to the AST generation stage.

I would like to use a 3-step procedure:

  1. Recognize the type of statement;
  2. Separate the tokens from the expression in lvalues rvalues and nodes as temporary and local AST;
  3. Design and add it to the global AST.

Here is what this would give for the declaration of a variable for example:

var MyVar : integer = 8 + 2;

Temporary form (rvalues / node / lvalues):

left:
    -left:
         "MyVar"
    -node:
         ":"
    -right:
         "integer"
node:
     "="
right:
    -left:
         "8"
    -node:
         "+"
    -right:
         "2"

Represented as a classic AST:

           "="
          /   \
         /     \
        /       \
      ":"       "+"
     /   \     /   \
    /     \  "8"   "2"
   /       \
"MyVar" "integer"

Then, the temporary tree is added to the global tree, specifying the type of declaration:

    [EXP]
      |
   VarDecl
      |
   { ... }

This works for everything except function declarations and function calls:

func add(a : integer, b : integer) : integer;

add(8, 2);

Indeed, for this type of expressions, there are no nodes to distinguish the lvalue from the rvalue. I also have no idea how to represent the function parameters. I had thought of something like this:

left:
    "add"
    params:
        [
         -left:
              "a"
         -node:
              ":"
         -right:
               "integer"
        ]
        [
         -left:
              "b"
         -node:
              ":"
         -right:
               "integer"
        ]
node:
    ":"
right:
    "integer"

Idem for call:

left:
    "add"
params:
    [
      "8"
    ]
    [
     "2"
    ]

But I feel like there's no logic left if I do this.

So, I was wondering if there wasn't a way of doing things that was close to mine in order to improve it, or if mine had to be completely revised.

PS: I am quite new in the field of abstract syntax analysis and trees, but I have read a lot of documents and tutorials about this subject.


Solution

  • First I would recommend looking into bison/flex for C++ or another parser generator since you can more easily group statements into a tree structure.

    For you function parameter problem, an AST is not just right node left. You can have multiple(>2) branches underneath a node and think of those branches as their grammar expressions instead of literal characters. This is where a lexer helps since you can abstract the characters into tokens, then the parser will abstract tokens into grammar structures. In general anything like a : integer should be abstracted into a grammar structure, possibly something called typed declaration.

    So func add(a : integer, b : integer) : integer; is really

    func identifier(params) : returnType

    and the nodes in the AST can keep track of the specific information.

    Namely your AST should be using 'characters' or 'tokens' but instead the interior nodes should abstractions of the grammar constructions of the language. Specifically for a parameter list, I would recommend it as a comma separated list of typed declarations, then the params node would have a list of declaration nodes for children.

    Also from you statement about adding the statement to the global tree it might be more useful to think of it as adding the statement to the global list of ASTs.

    Anyway kind of a weird answer, hope it helped.