Using pyparsing module I am able to parse key/value pairs from an input file. They can be like the following:
key1=value1
key2="value2"
key3="value3 and some more text
"
key4="value4 and ""inserted quotes"" with
more text"
Using the following rules:
eq = Literal('=').suppress()
v1 = QuotedString('"')
v2 = QuotedString('"', multline=True, escQuote='""')
value = Group(v1 | v2)("value")
kv = Group(key + eq + value)("key_value")
I now have a problem where quotes are used for line continuation within a quoted piece of text (!!!). Note that the quote is used within a key_value pair (not as an escape character) but as means to concatenate two adjacent lines.
Example:
key5="some more text that is so long that the authors who serialized it to a file thought it"
"would be a good idea to to concatenate strings this way"
Is there a way to handle this cleanly or should I try to identify these first and replace this concatenation method with another?
First off, your v2
expression is really a superset of your v1
expression. That is, anything that would match v1
will also match v2
, so you don't really need to do value = v1 | v2
, value = v2
will work.
Then, to handle the case with multiple "adjacent" quoted strings, instead of parsing for a single quoted string, parse for one or more, and then concat them with a parse action:
v2 = OneOrMore(QuotedString('"', multiline=True, escQuote='""'))
# add a parse action to convert multiple matched quoted strings to a single
# concatenated string
v2.addParseAction(''.join)
value = v2
# I made a slight change in this expression, moving the results names
# down into this compositional expression
kv = Group(key("key") + eq + value("value"))("key_value")
Using this test code:
for parsed_kv in kv.searchString(source):
print(parsed_kv.dump())
print()
will print:
[['key2', 'value2']]
- key_value: ['key2', 'value2']
- key: 'key2'
- value: 'value2'
[0]:
['key2', 'value2']
- key: 'key2'
- value: 'value2'
[['key3', 'value3 and some more text\n']]
- key_value: ['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[0]:
['key3', 'value3 and some more text\n']
- key: 'key3'
- value: 'value3 and some more text\n'
[['key4', 'value4 and "inserted quotes" with\nmore text']]
- key_value: ['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[0]:
['key4', 'value4 and "inserted quotes" with\nmore text']
- key: 'key4'
- value: 'value4 and "inserted quotes" with\nmore text'
[['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']]
- key_value: ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
[0]:
['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
- key: 'key5'
- value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'