I found several topics about pyparsing. They are dealing with almost the same problem in parsing nested loop, but even with that, i can't find a solution to my errors.
I have the following format :
key value;
header_name "optional_metadata"
{
key value;
sub_header_name
{
key value;
};
};
key value;
I used the following parser:
VALID_KEY_CHARACTERS = alphanums
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")
lbr = Literal( '{' ).suppress()
rbr = Literal( '}' ).suppress() + Literal(";").suppress()
expr = Forward()
atom = Word(VALID_KEY_CHARACTERS) + Optional(Word(VALID_VALUE_CHARACTERS))
pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )
When i use it, i got only the "header_name" and "header_metadata", i modified it, and i got only key/value inside a brace, python exception is triggered to show a parsing error (it expects '}' when reaching the sub_header_name.
Anyone can help me to understand why ? Thank you.
I think that the main problem is that your grammar does not fully describe the input, leading to several mismatches. The two main problems I saw was that you forgot that each of your key-pair values must end in a semicolon and did not specify that a key-pair value can come after a closing curly brace. It also looks like the lines:
pair = atom | lbr + OneOrMore( expr ) + rbr
expr << Group( atom + pair )
...would require each set of curly braces to contain, at minimum, two key-pair values or a key-pair value and a set of curly braces. I believe this would cause an error once you encounter the lines:
{
key value;
};
...within your input, though I'm not entirely certain.
In any case, after playing around with your grammar, I ended up with this:
from pyparsing import *
data = """key1 value1;
header_name "optional_metadata"
{
key2 value2;
sub_header_name
{
key value;
};
};
key3 value3;"""
# I'm reusing the key characters for the header names, which can contain a semicolon
VALID_KEY_CHARACTERS = srange("[a-zA-Z0-9_]")
VALID_VALUE_CHARACTERS = srange("[a-zA-Z0-9_\"\'\-\.@]")
semicolon = Literal(';').suppress()
lbr = Literal('{').suppress()
rbr = Literal('}').suppress()
key = Word(VALID_KEY_CHARACTERS)
value = Word(VALID_VALUE_CHARACTERS)
key_pair = Group(key + value + semicolon)("key_pair")
metadata = Group(key + Optional(value))("metadata")
header = key_pair + Optional(metadata)
expr = Forward()
contents = Group(lbr + expr + rbr + semicolon)("contents")
expr << header + Optional(contents) + Optional(key_pair)
print expr.parseString(data).asXML()
This results in the following output:
<key_pair>
<key_pair>
<ITEM>key1</ITEM>
<ITEM>value1</ITEM>
</key_pair>
<metadata>
<ITEM>header_name</ITEM>
<ITEM>"optional_metadata"</ITEM>
</metadata>
<contents>
<key_pair>
<ITEM>key2</ITEM>
<ITEM>value2</ITEM>
</key_pair>
<metadata>
<ITEM>sub_header_name</ITEM>
</metadata>
<contents>
<key_pair>
<ITEM>key</ITEM>
<ITEM>value</ITEM>
</key_pair>
</contents>
</contents>
<key_pair>
<ITEM>key3</ITEM>
<ITEM>value3</ITEM>
</key_pair>
</key_pair>
I'm not entirely sure if this is exactly what you were trying to accomplish, hopefully it should be close enough that you can tweak it to suit your particular task.