Search code examples
abstract-syntax-treegrako

Augmented Abstract Syntax Tree


Here is a simple grammar:

START = DECL DECL $ ;
DECL = TYPE NAME '=' VAL ;
TYPE = 'int' | 'float' ;
NAME = 'a' | 'b' ;
VAL = '4' ;

I parse this input stream with Grako:

int a = 4
float b = 4

and I retrieve this abstract syntax tree (JSON):

[
  "int",
  "a",
  [
    "=",
    "4"
  ],
  [
    "float",
    "b",
    [
      "=",
      "4"
    ]
  ]
]

Is there a simple way to obtain ASTs like this:

[
  "int" TYPE,
  "a" NAME,
  [
    "=" DECL,
    "4" VAL
  ],
  [
    "float" TYPE,
    "b" NAME,
    [
      "=" DECL,
      "4" VAL
    ]
  ]
]

or this:

...
"int TYPE",
...

?

I believe semantic actions in the Grako generated parser is the solution, but I can't figure it out.

Is there a simple way to do this ?


Solution

  • The output format you propose is not JSON-compatible, and it's not Python. By using Grako's features for AST customization you can obtain output that can be processed in Python and in any other language that has a JSON library.

    Modify the grammar by adding an AST name to the elements of interest, like this:

    START = DECL DECL $ ;
    DECL = TYPE:TYPE NAME:NAME '=' VAL:VAL ;
    TYPE = 'int' | 'float' ;
    NAME = 'a' | 'b' ;
    VAL = '4' ;
    

    And you'll obtain output like this:

    AST:
    [AST({'NAME': 'a', 'VAL': '4', 'TYPE': 'int'}), AST({'NAME': 'b', 'VAL': '4', 'TYPE': 'float'})]
    
    JSON:
    [
      {
        "TYPE": "int",
        "NAME": "a",
        "VAL": "4"
      },
      {
        "TYPE": "float",
        "NAME": "b",
        "VAL": "4"
      }
    ]
    

    The resulting AST is easy to process into whichever final output you need.