Pyparsing offers the ParseElementEnhance
subclass DelimitedList
for parsing (typically comma-separated) lists:
>>> kv_element = pp.Word(pp.alphanums)
>>> kv_list = pp.DelimitedList(kv_element)
>>> kv_list.parse_string('red, green, blue')
ParseResults(['red', 'green', 'blue'], {})
And it provides the TokenConverter
subclass Dict
, for transforming a repeating expression into a dictionary:
>>> key = value = pp.Word(pp.alphanums)
>>> kv_pair = key + pp.Suppress("=") + value
>>> kv_dict = pp.Dict(pp.Group(kv_pair)[...])
>>> kv_dict.parse_string('R=red G=green B=blue')
ParseResults([
ParseResults(['R', 'red'], {}),
ParseResults(['G', 'green'], {}),
ParseResults(['B', 'blue'], {})
], {'R': 'red', 'G': 'green', 'B': 'blue'})
But combining them feels awkward. It's possible to build a successful combined ParserElement
for parsing a dict out of a delimited list, but compared to the above it requires:
DelimitedList
to output Group()
sDelimitedList
when constructing the Dict()
around it, to appease the type checker.1>>> kv_pair = key + pp.Suppress("=") + value
>>> kv_pairlist = pp.DelimitedList(pp.Group(kv_pair))
>>> kv_pairdict = pp.Dict(kv_pairlist[...])
>>> kv_pairdict.parse_string('R=red, G=green, B=blue')
ParseResults([
ParseResults(['R', 'red'], {}),
ParseResults(['G', 'green'], {}),
ParseResults(['B', 'blue'], {})
], {'R': 'red', 'G': 'green', 'B': 'blue'})
The whole effect reads like you're defining a parser to create a dictionary from a series of 1-element delimited lists, each containing a single key-value pair match. (In fact, I'm not entirely sure that isn't what's actually happening in the parser.)
Writing code to express the intent — a parser definition to match a single delimited list, containing a series of key-value pair matches — feels like a struggle against the API. (The fact that using kv_pairdict = pp.Dict(kv_pairlist)
will function the same as above, but runs afoul of the type checker, is especially vexing.)
Is there a cleaner way to express the intended parser definition, within the Pyparsing API? If not, is that a deficiency of my design, of Pyparsing's API, or something else?
(Do I have the definition inside out? DelimitedList(Dict(Group(kv_pair)[1, ...]))
does also work, but feels even more conceptually backwards to me. But it doesn't involve nearly as much fighting against the API, so maybe I'm just looking at it wrong.)
No overload variant of "dict" matches argument type "DelimitedList" (mypycall-overload)
Possible overload variants:
def [_KT, _VT] __init__(self) -> dict[_KT, _VT] def [_KT, _VT] __init__(self, **kwargs: _VT) -> dict[str, _VT] def [_KT, _VT] __init__(self, SupportsKeysAndGetItem[_KT, _VT], /) -> dict[_KT, _VT] def [_KT, _VT] __init__(self, SupportsKeysAndGetItem[str, _VT], /, **kwargs: _VT) -> dict[str, _VT] def [_KT, _VT] __init__(self, Iterable[tuple[_KT, _VT]], /) -> dict[_KT, _VT] def [_KT, _VT] __init__(self, Iterable[tuple[str, _VT]], /, **kwargs: _VT) -> dict[str, _VT] def [_KT, _VT] __init__(self, Iterable[list[str]], /) -> dict[str, str] def [_KT, _VT] __init__(self, Iterable[list[bytes]], /) -> dict[bytes, bytes]mypy(note)
Dict
is to be constructed using a single ParserElement
that represents repetition of Group'ed ParserElements, taking the text matched in the 0'th element of each Group as that Group's key, and the remainder of the Group as the corresponding value. Typically, the repetition is done using OneOrMore
or ZeroOrMore
(or their new slice-ilike notations [1, ...]
or [...]
). But it is perfectly suitable to use DelimitedList
for this repetition, as long as the expression used for the repeated key-value pairs is a Group. See if this slight reworking of your code helps (I really just moved the Group up to the kv_pair
definition):
key = pp.common.identifier # keep the keys usable for use as attribute names
value = pp.Word(pp.alphanums)
kv_pair = pp.Group(key + pp.Suppress("=") + value)
kv_pairlist = pp.DelimitedList(kv_pair)
kv_pairdict = pp.Dict(kv_pairlist)
kv_pairdict.run_tests("""\
R=red, G=green, B=blue
"""
)
I saved this as dict_of_delimited_list.py
, and adding these lines, I get dict_of_delimited_list_diagram.html
containing this railroad diagram of your parser.
pp.autoname_elements()
kv_pairdict.create_diagram(f"{__file__.removesuffix('.py')}_diagram.html")
As for your note, I strongly suspect a problem in/with the type checker. The __init__
signature for Dict clearly takes a single ParserElement, not a key type and value type, so I suspect the type checker is seeing "pp.Dict" and thinking "typing.Dict". I confirmed this using a modified version of pyparsing that renames Dict
to DictOf
, and made no other changes, and your insane-sounding type suggestion was resolved. (Unfortunately, just doing from pyparsing import Dict as DictOf
was not sufficient.) As a side note, I'd like to mention that pyparsing's Dict class predates typing.Dict by about 15 years - pyparsing had Dict first!