I am trying to define a input parser using pyparsing. I have managed to get something like the following parsed correctly:
key = value
where value can be, for example an integer. This is the Python code:
import pyparsing as pp
_name = pp.Word(pp.alphas + '_', pp.alphanums + '_')
_key = _name + EQ
_value = pp.pyparsing_common.signed_integer
pp.dictOf(_key, _value)
Now I would like to add a "raw" data input:
$raw
H 0.0 0.0 0.0
F 1.0 1.0 1.0
$end
where anything between $
and $end
is "gobbled up" into a string, whose key will be raw
. I have tried with:
import pyparsing as pp
SDATA = pp.Literal('$').suppress()
EDATA = pp.CaselessLiteral('$end').suppress()
data_t = pp.Combine(SDATA + pp.Word(pp.alphas + '_', pp.alphanums + '_')
) + pp.SkipTo(EDATA) + EDATA
data_t.setName('raw data')
data_t.setParseAction(lambda token: (token[0], token[1]))
and this works with the input string '$raw\nH 0.0 0.0 0.0\nF 1.0 1.0 1.0\n$end'
I can't however manage to combine the key = value
parser with data_t
. Anything obvious I am missing here? Or is it just not possible to combine the two?
UPDATE
This is the test input:
$raw
H 0.0 0.0 0.0
F 1.0 1.0 1.0
$end
int = 42
and this is the way I am combining the key = value
and "raw" data parsers:
parser = pp.dictOf(_key, _value) ^ data_t
with parsing then invoked as:
tokens = parser.parseString(keywords).asDict()
This return an empty dict
. Moving int = 42
above $raw ... $end
returns just {'int': 42 }
.
Yowch! This is partially due to a bug in dictOf
. To make progress on this, I defined the following:
kv_dict = pp.Dict(pp.OneOrMore(pp.Group(_key + _value)))
Then I defined my parser as:
parser = pp.OneOrMore(pp.Group(data_t) | pp.Group(kv_dict))
The grouping is important so that keys in one set don't step on those in another. (The bug in dictOf
prevents including it like this inside a OneOrMore
, since dictOf
allows an empty dict, so it loops forever at the end of the string instead of terminating.)
Finally, I use dump()
to see the results instead of asDict()
. asDict()
will only show parsed tokens that are named, which your data_t
is not.
print(parser.parseString(sample).dump())
Gives:
[[('raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n')], [['int', 42]]]
[0]:
[('raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n')]
[1]:
[['int', 42]]
- int: 42
If you want to add names in data_t, change the definition to this, and drop the parse action:
data_t = pp.Combine(SDATA + pp.Word(pp.alphas + '_', pp.alphanums + '_')
)("name") + pp.SkipTo(EDATA)("body") + EDATA
Now I get:
[['raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n'], [['int', 42]]]
[0]:
['raw', 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n']
- body: 'H 0.0 0.0 0.0\nF 1.0 1.0 1.0\n'
- name: 'raw'
[1]:
[['int', 42]]
- int: 42