Search code examples
pythonpyparsing

Railroad diagrams in Pyparsing: How about Forward() declarations? Rule renaming?


I'm using pyparsing 3.0.9, python 3.9.16, and I'm trying to write a grammar for a (sub-)set of YAML. Not so much for the produced parser, as for the railroad diagrams. The actual state of the program is shown below.

  • The grammar (defined here), as expected, has recursion (mappings can contain mappings). However, I can't seem to find how (or where) to set the name, so it appears correctly in the diagram. Setting it in the Forward() declaration, or in the actaul declaration? Any combination I tried produces output errors.

  • If I declare rules which derive from common 'ancestor', I have to declare them with a copy() from that ancestor, else set_name() fails except for the last one. This seems logical, except it doesn't seem to work always.

  • Some parts of the diagrams seem to be incorrect (not corresponding to the definition). Example: The node definition produces alias twice at the start.

Can someone point me in the right direction?

My code:

import pyparsing as pp

def make_parser():
    mapping = pp.Forward().set_name('mapping')
    label = pp.Word(pp.alphanums + '-_')
    true_false = pp.one_of('yes no true false').set_name('true_false')

    anchor = label.copy().set_name('anchor')
    tag    = label.copy().set_name('tag')
    alias  = label.copy().set_name('alias')

    key_value = (
        (pp.Keyword('yaml-scalar-event') +
            (pp.Keyword('yaml-scalar-event') ^ mapping))
    ).set_name('key_value')

    mapping = (
        pp.Keyword('yaml-mapping-start-event') +
        pp.ZeroOrMore(key_value) +
        pp.Keyword('yaml-mapping-end-event')
    )

    sequence = (
        anchor ^
        tag
    ).set_name('sequence')

    scalar = (
        alias ^
        tag ^
        ('plain_implicit' + true_false) ^
        ('quoted_implicit' + true_false) ^
        mapping
    ).set_name('scalar')

    node = (
        alias ^
        scalar ^
        sequence ^
        mapping
    ).set_name('node')

    document = (
        pp.Keyword('yaml-document-start-event') +
        pp.ZeroOrMore(node) +
        pp.Keyword('yaml-document-end-event')
    ).set_name('document')

    stream = (
        pp.Keyword('yaml-stream-start-event') +
        pp.ZeroOrMore(document) +
        pp.Keyword('yaml-stream-end-event')
    ).set_name('stream')

    return stream


def test_parser():
    parser = make_parser()

    parser.create_diagram('yaml_grammar.html',
        vertical = 2)



def main(args):
    parser = make_parser()
    parser.create_diagram('yaml_grammar.html', vertical = 2)


if __name__ == '__main__':
    import sys
    sys.exit(main(sys.argv))

Which produces the following output:

enter image description here


Solution

  • I love this! I agree, I like Michael Milton's addition of railroad diagramming to pyparsing and I've done some very similar work just to get a railroad diagram. Your question raised some interesting points about the railroad diagramming process, and I'm making a few tweaks to the pyparsing diagramming code to make the diagrams better.

    First off, here are some changes in your parser to get a clean diagram:

    def make_parser():
        """
        stream ::= STREAM-START document* STREAM-END
        document ::= DOCUMENT-START node DOCUMENT-END
        node ::= ALIAS | SCALAR | sequence | mapping
        sequence ::= SEQUENCE-START node* SEQUENCE-END
        mapping ::= MAPPING-START (node node)* MAPPING-END
        """
    
        # when I define Forwards, I try to go to the lowest possible
        # term in the BNF, in this case node
        # mapping = pp.Forward().set_name('mapping')
        node = pp.Forward().set_name("node")
        label = pp.Word(pp.alphanums + '-_')
        true_false = pp.one_of('yes no true false').set_name('true_false')
    
        anchor = label.copy().set_name('anchor')
        tag    = label.copy().set_name('tag')
        alias  = label.copy().set_name('alias')
    
        # add Group around key_value to keep from merging it with surrounding
        # terms in the diagram
        key_value = pp.Group(
            node + node
            # (pp.Keyword('yaml-scalar-event') +
            #     (pp.Keyword('yaml-scalar-event') ^ mapping))
        )#.set_name('key_value')
        # I suppressed the key_value naming because I liked the explict node-node
        # element in the diagram instead of the indirect key_value label.
    
        mapping = (
            # pyparsing will auto-promote strings to Literals, which should
            # be sufficient for your diagramming efforts, and less typing for you
            # (just so long as the string is immediately preceded or followed by
            # some kind of pyparsing ParserElement)
            # pp.Keyword('yaml-mapping-start-event') +
            'yaml-mapping-start-event' +
            # replaced ZeroOrMore usage with [...], purely a style choice
            # pp.ZeroOrMore(key_value) +
            key_value[...] +
            'yaml-mapping-end-event'
        ).set_name("mapping")
    
        sequence = (
            anchor ^
            tag
        ).set_name('sequence')
    
        scalar = pp.Group(
            # alias and mapping are already included in node
            # alias ^
            tag ^
            ('plain_implicit' + true_false) ^
            ('quoted_implicit' + true_false) #^
            # mapping
        ).set_name('scalar')
    
        # IMPORTANT!!! - be sure to use '<<=', not '=' when defining the expression
        # that needs to be parsed by a Forward.
        node <<= (
            alias ^
            scalar ^
            sequence ^
            mapping
        ).set_name('node')
    
        document = (
            'yaml-document-start-event' +
            node[...] +
            'yaml-document-end-event'
        ).set_name('document')
    
        stream = (
            'yaml-stream-start-event' +
            document[...] +
            'yaml-stream-end-event'
        ).set_name('stream')
    
        return stream
    

    My changes were:

    • make node the Forward instead of mapping
    • MUST USE "<<=" operator to define contents of a Forward (this is a common mistake, and pyparsing offers some diagnostic warnings to help catch it)
    • changed key_value to just node + node, per the BNF
    • removed alias and mapping from scalar, since they were being duplicated with node in the diagram
    • cosmetic changes
      • changed Keyword literals to simple string literals
      • used [...] for repetition

    Using this code, to create the diagram:

        parser.create_diagram(
            'yaml_grammar.html',
            show_groups=False,
            vertical=2,
        )
    

    gives this diagram:

    pyparsing railroad diagram, showing a bug in pyparsing

    I didn't like a couple of things. For one, even though I set show_groups to False, we still see a grouping around the key-value nodes - a bug I have now fixed. Also, using the (2) repetition indicator feels clunky when the repetition is only 2 elements long, so I've special-cased repetition to only use this notation for 3 or more elements.

    With these fixes/changes (to be in the next pyparsing release), I now get this diagram, I hope it is close to your intended look (and I'm sorry to have taken so long to respond on this).

    improved pyparsing diagram