Search code examples
pythondiagrambnfebnf

Translating Python's formal language into Rail Diagrams


I am currently trying to translate Python's formal grammar (https://docs.python.org/3/reference/grammar.html) into Rail Diagrams. The website we are using http://www.bottlecaps.de/rr/ui is very helpful for most of it and we have changed many things by hand to fir the proper notation for it to create the rail diagram but there is still 50+ lines that are incorrect and very hard for us to fix as we are brand new to this. Is there an easier way to do this than changing it all by hand?

Note the website uses EBNF

Thanks for your time,


Solution

  • Write a parser that parses the grammar, then transform from parse-tree to required notation.

    The transformation itself is fairly simple:

    • replace '#' comment introducers by '//'
    • replace ':' tokens by '::='
    • replace '[' tokens by '('
    • replace ']' tokens by ')?'

    A suitable meta-grammar, in W3C notation, is

    Grammar  ::= Rule+ EOF
    Rule     ::= Nonterminal ':' Alternatives
    Alternatives
             ::= Alternative ( '|' Alternative )*
    Alternative
             ::= ( Symbol ( '*' | '+' )? )*
    Symbol   ::= Nonterminal
               | Terminal
               | '(' Alternatives ')'
               | '[' Alternatives ']'
    
    <?TOKENS?>
    
    Nonterminal
             ::= [a-z] [a-z_]*
    Terminal ::= [A-Z] [A-Z_]*
               | "'" [^']+ "'"
    EOF      ::= $
    IgnorableWhitespace
             ::= [ #x9#xA#xD]+
               | '#' [^#xA]* [#xA]
              /* ws: definition */
    

    Put it in grammar.ebnf, then use REx to create a parser for it, coded e.g. in XQuery, using this command line:

       -xquery -tree
    

    This gives you XQuery module grammar.xquery. Next, put the python grammar in python.grammar, and this XQuery program in transform.xquery:

    import module namespace p="grammar" at "grammar.xquery";
    declare option saxon:output "method=text";
    declare variable $input as xs:string external;
    for $token in p:parse-Grammar(unparsed-text($input))//text()
    return
      if (starts-with(normalize-space($token), "#")) then
        replace($token, "((^|&#xA;)[\s])*#", "$1//")
      else
        switch ($token)
        case ":" return "::="
        case "[" return "("
        case "]" return ")?"
        default return $token
    

    Then use Saxon to run it:

      java net.sf.saxon.Query transform.xquery input=python.grammar > python.ebnf
    

    The result is what you were looking for.

    Of course you can also use your favorite text editor to carefully do global replaces achieving the same. It's just so much more fun to do it right.