Search code examples
concatenationlinequotespyparsingcontinuations

pyparsing key value pairs with quotes and line continuation


Using pyparsing module I am able to parse key/value pairs from an input file. They can be like the following:

key1=value1
key2="value2"
key3="value3 and some more text
"
key4="value4 and ""inserted quotes"" with
more text"

Using the following rules:

eq = Literal('=').suppress()
v1 = QuotedString('"')
v2 = QuotedString('"', multline=True, escQuote='""')
value = Group(v1 | v2)("value")
kv = Group(key + eq + value)("key_value")

I now have a problem where quotes are used for line continuation within a quoted piece of text (!!!). Note that the quote is used within a key_value pair (not as an escape character) but as means to concatenate two adjacent lines.

Example:

key5="some more text that is so long that the authors who serialized it to a file thought it"
"would be a good idea to to concatenate strings this way"

Is there a way to handle this cleanly or should I try to identify these first and replace this concatenation method with another?


Solution

  • First off, your v2 expression is really a superset of your v1 expression. That is, anything that would match v1 will also match v2, so you don't really need to do value = v1 | v2, value = v2 will work.

    Then, to handle the case with multiple "adjacent" quoted strings, instead of parsing for a single quoted string, parse for one or more, and then concat them with a parse action:

    v2 = OneOrMore(QuotedString('"', multiline=True, escQuote='""'))
    
    # add a parse action to convert multiple matched quoted strings to a single
    # concatenated string
    v2.addParseAction(''.join)
    
    value = v2
    
    # I made a slight change in this expression, moving the results names
    # down into this compositional expression
    kv = Group(key("key") + eq + value("value"))("key_value")
    

    Using this test code:

    for parsed_kv in kv.searchString(source):
        print(parsed_kv.dump())
        print()
    

    will print:

    [['key2', 'value2']]
    - key_value: ['key2', 'value2']
      - key: 'key2'
      - value: 'value2'
    [0]:
      ['key2', 'value2']
      - key: 'key2'
      - value: 'value2'
    
    [['key3', 'value3 and some more text\n']]
    - key_value: ['key3', 'value3 and some more text\n']
      - key: 'key3'
      - value: 'value3 and some more text\n'
    [0]:
      ['key3', 'value3 and some more text\n']
      - key: 'key3'
      - value: 'value3 and some more text\n'
    
    [['key4', 'value4 and "inserted quotes" with\nmore text']]
    - key_value: ['key4', 'value4 and "inserted quotes" with\nmore text']
      - key: 'key4'
      - value: 'value4 and "inserted quotes" with\nmore text'
    [0]:
      ['key4', 'value4 and "inserted quotes" with\nmore text']
      - key: 'key4'
      - value: 'value4 and "inserted quotes" with\nmore text'
    
    [['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']]
    - key_value: ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
      - key: 'key5'
      - value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'
    [0]:
      ['key5', 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way']
      - key: 'key5'
      - value: 'some more text that is so long that the authors who serialized it to a file thought it would be a good idea to to concatenate strings this way'