Search code examples
pythonnested-loopspyparsingbraces

PyParsing Parse nested loop with brace and specific header


I found several topics about pyparsing. They are dealing with almost the same problem in parsing nested loop, but even with that, i can't find a solution to my errors.

I have the following format :

key value;
header_name "optional_metadata"
{
     key value;
     sub_header_name
     {
        key value;
     };
};
key value;
  • Key is alphanum
  • Value may be type of Int, String, with alphanum + "@._"
  • key/value may be after a brace block
  • key/value may be in the file before the first brace block
  • key/value before or after a brace block are optionals
  • header may have a name
  • Closing brace is followed by a semi-colon

I used the following parser:

VALID_KEY_CHARACTERS = alphanums
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")

lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress() + Literal(";").suppress()

expr = Forward()
atom = Word(VALID_KEY_CHARACTERS) + Optional(Word(VALID_VALUE_CHARACTERS))
pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )

When i use it, i got only the "header_name" and "header_metadata", i modified it, and i got only key/value inside a brace, python exception is triggered to show a parsing error (it expects '}' when reaching the sub_header_name.

Anyone can help me to understand why ? Thank you.


Solution

  • I think that the main problem is that your grammar does not fully describe the input, leading to several mismatches. The two main problems I saw was that you forgot that each of your key-pair values must end in a semicolon and did not specify that a key-pair value can come after a closing curly brace. It also looks like the lines:

    pair = atom | lbr + OneOrMore( expr ) + rbr
    expr << Group( atom + pair )
    

    ...would require each set of curly braces to contain, at minimum, two key-pair values or a key-pair value and a set of curly braces. I believe this would cause an error once you encounter the lines:

    {
        key value;
    };
    

    ...within your input, though I'm not entirely certain.

    In any case, after playing around with your grammar, I ended up with this:

    from pyparsing import *
    
    data = """key1 value1; 
    header_name "optional_metadata"
    {
         key2 value2;
         sub_header_name
         {
            key value;
         };
    };
    key3 value3;"""
    
    # I'm reusing the key characters for the header names, which can contain a semicolon
    VALID_KEY_CHARACTERS = srange("[a-zA-Z0-9_]")
    VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")
    
    semicolon = Literal(';').suppress()
    lbr = Literal('{').suppress()
    rbr = Literal('}').suppress()
    
    key = Word(VALID_KEY_CHARACTERS)
    value = Word(VALID_VALUE_CHARACTERS)
    
    key_pair = Group(key + value + semicolon)("key_pair")
    metadata = Group(key + Optional(value))("metadata")
    
    header = key_pair + Optional(metadata)
    
    expr = Forward()
    contents = Group(lbr + expr + rbr + semicolon)("contents")
    expr << header + Optional(contents) + Optional(key_pair)
    
    print expr.parseString(data).asXML()
    

    This results in the following output:

    <key_pair>
      <key_pair>
        <ITEM>key1</ITEM>
        <ITEM>value1</ITEM>
      </key_pair>
      <metadata>
        <ITEM>header_name</ITEM>
        <ITEM>&quot;optional_metadata&quot;</ITEM>
      </metadata>
      <contents>
        <key_pair>
          <ITEM>key2</ITEM>
          <ITEM>value2</ITEM>
        </key_pair>
        <metadata>
          <ITEM>sub_header_name</ITEM>
        </metadata>
        <contents>
          <key_pair>
            <ITEM>key</ITEM>
            <ITEM>value</ITEM>
          </key_pair>
        </contents>
      </contents>
      <key_pair>
        <ITEM>key3</ITEM>
        <ITEM>value3</ITEM>
      </key_pair>
    </key_pair>
    

    I'm not entirely sure if this is exactly what you were trying to accomplish, hopefully it should be close enough that you can tweak it to suit your particular task.