Search code examples
pythoncode-generationgrako

Grako "code" generation


I am trying to understand how one can re-create a document parsed by a parser generated by grako.

After burying myself deep in the grako source code, I believe I have finally understood how one returns from the AST to generated document. Could somebody please check that my following understanding is correct, and let me know if there is a more straight forward method?

  1. One creates a PEG grammar one wishes to parse. Grako creates a parser class and a sematics class based on it.
  2. One creates (by hand) a python module containing (more or less) a separate class (subclass of grako.model.Node) for every rule in one's grammar. Each class must at least have a constructor with parameters for every named element in the corresponding rule and store its values in a class property.
  3. One subclasses (by hand) the generated semantics class to replace the ast for each rule by the corresponding class one created in step 2.
  4. One creates (by hand) a python module a subclass of grako.codegen.ModelRenderer defining the template for "code" generation for (more or less) each rule in one's grammar.
  5. One feeds the AST consisting of the Node subclasses and the python module containing the templates to grako.codegen.CodeGenerator().render(...) to create the output.

Can this be right? This does not seem intuitive at all.

  • Why would one go through the significant effort of step 2 & 3 to do nothing more than store the information that is already contained in the AST?
  • What is the advantage of this approach rather than working directly from the AST?
  • Is there a way to automate or sidestep steps 2 & 3 if one only wants to recreate a document in the original grammar?
  • Given a PEG grammar definition, is it theoretically possible to automatically create a "code generator generator" just as one creates a "parser generator"?

Solution

  • If you look at how Grako itself parses grammars, you'll notice that the step 2 classes are created synthetically by a ModelBuilderSemantics descendant:

    # from grako/semantics.py
    class GrakoSemantics(ModelBuilderSemantics):
        def __init__(self, grammar_name):
            super(GrakoSemantics, self).__init__(
                baseType=grammars.Model,
                types=grammars.Model.classes()
            )
            self.grammar_name = grammar_name
            self.rules = OrderedDict()
    ...
    

    The classes are synthesized if they are not present in the types= paramenter. All that ModelBuilderSemantics requires is that each grammar rule carries a parameter that gives the class name for the corresponding Node:

    module::Module = .... ;
    

    or,

    module(Module) = ... ;
    

    Step 3 is unavoidable, because the translation must be specified "somewhere". Grako's way allows for str templates specified inline with dispatching done by CodeGenerator, which is my preferred way of doing translation. But I use grako.model.DepthFirstNodeWalker when I just need to pull information out of a model, like when generating a symbol table or computing metrics.

    Step 3 cannot be automated because mapping the semantics of the source language to the semantics of the target language requires brainpower, even when the source and target are the same.

    One can also get away with traversing the JSON-like Python structure that parse() or grako.model.Node.asjson() generates (the AST), as you suggest, but the processing code would be full of if-then-elseif to distinguish one dictionary from another, or one list from the other. With models every dict in the hierarchy has a Python class as type.

    In the end, Grako doesn't impose a way to create a model of what was parsed, nor a way to translate it into something else. In it's basic form, Grako provides just either a Concrete Syntax Tree (CST) or an Abstract Syntax Tree (AST) if element naming is used wisely. Everything else is produced by a specific semantics class, which can be whatever one desires.