Search code examples
pythonregexpyparsingbnf

Convert BNF grammar to pyparsing


How can I describe a grammar using regex (or pyparsing is better?) for a script languge presented below (Backus–Naur Form):

<root>   :=     <tree> | <leaves>
<tree>   :=     <group> [* <group>] 
<group>  :=     "{" <leaves> "}" | <leaf>;
<leaves> :=     {<leaf>;} leaf
<leaf>   :=     <name> = <expression>{;}

<name>          := <string_without_spaces_and_tabs>
<expression>    := <string_without_spaces_and_tabs>

Example of the script:

{
 stage = 3;
 some.param1 = [10, 20];
} *
{
 stage = 4;
 param3 = [100,150,200,250,300]
} *
 endparam = [0, 1]

I use python re.compile and want to divide everything in groups, something like this:

[ [ 'stage',       '3'],
  [ 'some.param1', '[10, 20]'] ],

[ ['stage',  '4'],
  ['param3', '[100,150,200,250,300]'] ],

[ ['endparam', '[0, 1]'] ]

Updated: I've found out that pyparsing is much better solution instead of regex.


Solution

  • Pyparsing lets you simplify some of these kinds of constructs

    leaves :: {leaf} leaf
    

    to just

    OneOrMore(leaf)
    

    So one form of your BNF in pyparsing will look something like:

    from pyparsing import *
    
    LBRACE,RBRACE,EQ,SEMI = map(Suppress, "{}=;")
    name = Word(printables, excludeChars="{}=;")
    expr = Word(printables, excludeChars="{}=;") | quotedString
    
    leaf = Group(name + EQ + expr + SEMI)
    group = Group(LBRACE + ZeroOrMore(leaf) + RBRACE) | leaf
    tree = OneOrMore(group)
    

    I added quotedString as an alternative expr, in case you wanted to have something that did include one of the excluded chars. And adding Group around leaf and group will maintain the bracing structure.

    Unfortunately, your sample doesn't quite conform to this BNF:

    1. spaces in [10, 20] and [0, 1] make them invalid exprs

    2. some leafs do not have terminating ;s

    3. lone * characters - ???

    This sample does parse successfully with the above parser:

    sample = """
    {
     stage = 3;
     some.param1 = [10,20];
    }
    {
     stage = 4;
     param3 = [100,150,200,250,300];
    }
     endparam = [0,1];
     """
    
    parsed = tree.parseString(sample)    
    parsed.pprint()
    

    Giving:

    [[['stage', '3'], ['some.param1', '[10,20]']],
     [['stage', '4'], ['param3', '[100,150,200,250,300]']],
     ['endparam', '[0,1]']]