I am trying to understand how one can re-create a document parsed by a parser generated by grako.
After burying myself deep in the grako source code, I believe I have finally understood how one returns from the AST to generated document. Could somebody please check that my following understanding is correct, and let me know if there is a more straight forward method?
grako.model.Node
) for every rule in one's grammar. Each class must at least have a constructor with parameters for every named element in the corresponding rule and store its values in a class property.grako.codegen.ModelRenderer
defining the template for "code" generation for (more or less) each rule in one's grammar. grako.codegen.CodeGenerator().render(...)
to create the output.Can this be right? This does not seem intuitive at all.
If you look at how Grako itself parses grammars, you'll notice that the step 2 classes are created synthetically by a ModelBuilderSemantics
descendant:
# from grako/semantics.py
class GrakoSemantics(ModelBuilderSemantics):
def __init__(self, grammar_name):
super(GrakoSemantics, self).__init__(
baseType=grammars.Model,
types=grammars.Model.classes()
)
self.grammar_name = grammar_name
self.rules = OrderedDict()
...
The classes are synthesized if they are not present in the types=
paramenter. All that ModelBuilderSemantics
requires is that each grammar rule carries a parameter that gives the class name for the corresponding Node
:
module::Module = .... ;
or,
module(Module) = ... ;
Step 3 is unavoidable, because the translation must be specified "somewhere". Grako's way allows for str
templates specified inline with dispatching done by CodeGenerator
, which is my preferred way of doing translation. But I use grako.model.DepthFirstNodeWalker
when I just need to pull information out of a model, like when generating a symbol table or computing metrics.
Step 3 cannot be automated because mapping the semantics of the source language to the semantics of the target language requires brainpower, even when the source and target are the same.
One can also get away with traversing the JSON-like Python structure that parse()
or grako.model.Node.asjson()
generates (the AST), as you suggest, but the processing code would be full of if-then-elseif
to distinguish one dictionary from another, or one list from the other. With models every dict in the hierarchy has a Python class as type.
In the end, Grako doesn't impose a way to create a model of what was parsed, nor a way to translate it into something else. In it's basic form, Grako provides just either a Concrete Syntax Tree (CST) or an Abstract Syntax Tree (AST) if element naming is used wisely. Everything else is produced by a specific semantics class, which can be whatever one desires.