Search code examples
python-3.xpyparsing

pyparsing SkipTo and key = value input


I am trying to define a input parser using pyparsing. I have managed to get something like the following parsed correctly:

key = value

where value can be, for example an integer. This is the Python code:

import pyparsing as pp

_name = pp.Word(pp.alphas + '_', pp.alphanums + '_')
_key = _name + EQ
_value = pp.pyparsing_common.signed_integer

pp.dictOf(_key, _value)

Now I would like to add a "raw" data input:

$raw
H 0.0 0.0 0.0
F 1.0 1.0 1.0
$end

where anything between $ and $end is "gobbled up" into a string, whose key will be raw. I have tried with:

import pyparsing as pp

SDATA = pp.Literal('$').suppress()
EDATA = pp.CaselessLiteral('$end').suppress()
data_t = pp.Combine(SDATA + pp.Word(pp.alphas + '_', pp.alphanums + '_')
                ) + pp.SkipTo(EDATA) + EDATA
data_t.setName('raw data')
data_t.setParseAction(lambda token: (token[0], token[1]))

and this works with the input string '$raw\nH 0.0 0.0 0.0\nF 1.0 1.0 1.0\n$end' I can't however manage to combine the key = value parser with data_t. Anything obvious I am missing here? Or is it just not possible to combine the two?

UPDATE

This is the test input:

$raw
H 0.0 0.0 0.0
F 1.0 1.0 1.0
$end

int = 42

and this is the way I am combining the key = value and "raw" data parsers:

parser = pp.dictOf(_key, _value) ^ data_t

with parsing then invoked as:

tokens = parser.parseString(keywords).asDict()

This return an empty dict. Moving int = 42 above $raw ... $end returns just {'int': 42 }.


Solution

  • Yowch! This is partially due to a bug in dictOf. To make progress on this, I defined the following:

    kv_dict = pp.Dict(pp.OneOrMore(pp.Group(_key + _value)))
    

    Then I defined my parser as:

    parser = pp.OneOrMore(pp.Group(data_t) | pp.Group(kv_dict))
    

    The grouping is important so that keys in one set don't step on those in another. (The bug in dictOf prevents including it like this inside a OneOrMore, since dictOf allows an empty dict, so it loops forever at the end of the string instead of terminating.)

    Finally, I use dump() to see the results instead of asDict(). asDict() will only show parsed tokens that are named, which your data_t is not.

    print(parser.parseString(sample).dump())
    

    Gives:

    [[('raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n')], [['int', 42]]]
    [0]:
      [('raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n')]
    [1]:
      [['int', 42]]
      - int: 42
    

    If you want to add names in data_t, change the definition to this, and drop the parse action:

    data_t = pp.Combine(SDATA + pp.Word(pp.alphas + '_', pp.alphanums + '_')
                    )("name") + pp.SkipTo(EDATA)("body") + EDATA
    

    Now I get:

    [['raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n'], [['int', 42]]]
    [0]:
      ['raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n']
      - body: 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n'
      - name: 'raw'
    [1]:
      [['int', 42]]
      - int: 42