I'm trying to create a parser for generic Python functions separating out args
and kwargs
. I've looked through the examples but couldn't find one that helps.
Here is an example of what I'd like to parse and what I'd like the output to be after I parse with parseString().asDict()
.
example = "test(1, 2, 3, hello, a=4, stuff=there, d=5)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}
or
example = "test(a=4, stuff=there, d=5)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': '', 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}
or
example = "test(1, 2, 3, hello)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': ''}
Both the arguments and keyword arguments should be optional and I'm ignoring for the moment super generic *args
, **kwargs
and input nested lists, etc. I managed to get something working when there are only args or kwargs but fails when I have both.
import pyparsing as pp
LPAR = pp.Suppress('(')
RPAR = pp.Suppress(')')
# define generic number
number = pp.Regex(r"[+-~]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")
# define function arguments
arglist = pp.delimitedList(number | (pp.Word(pp.alphanums + '-_') + pp.NotAny('=')) )
args = pp.Group(arglist).setResultsName('args')
# define function keyword arguments
key = pp.Word(pp.alphas) + pp.Suppress('=')
values = (number | pp.Word(pp.alphas))
keyval = pp.dictOf(key, values)
kwarglist = pp.delimitedList(keyval)
kwargs = pp.Group(kwarglist).setResultsName('kwargs')
# build generic function
fxn_args = pp.Optional(args, default='') + pp.Optional(kwargs, default='')
fxn_name = (pp.Word(pp.alphas)).setResultsName('name')
fxn = pp.Group(fxn_name + LPAR + fxn_args + RPAR)
And the results
# parsing only kwargs
fxn.parseString('test(a=4, stuff=there, d=5)')[0].asDict()
{'name': 'test', 'args': '', 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}
# parsing only args
fxn.parseString('test(1, 2, 3, hello)')[0].asDict()
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': ''}
# parsing both
fxn.parseString('test(1, 2, 3, hello, a=4, stuff=there, d=5)')[0].asDict()
...
ParseException: Expected ")", found ',' (at char 19), (line:1, col:20)
If I check parsing just the fxn_args
, I get the kwargs
simply missing altogether
# parse only kwargs
fxn_args.parseString('c=4, stuff=there, d=5.234').asDict()
{'args': '', 'kwargs': {'c': '4', 'stuff': 'there', 'd': '5.234'}}
# parse both args and kwargs
fxn_args.parseString('1, 2, 3, hello, c=4, stuff=there, d=5.234').asDict()
{'args': ['1', '2', '3', 'hello'], 'kwargs': ''}
If both args and kwargs are present, your parser is tripping over the ',' between them.
You can see this for yourself using pyparsing's runTests method:
fxn.runTests("""\
# parsing only kwargs
test(a=4, stuff=there, d=5)
# parsing only args
test(1, 2, 3, hello)
# parsing both
test(1, 2, 3, hello, a=4, stuff=there, d=5)
""")
Will print:
# parsing only kwargs
test(a=4, stuff=there, d=5)
[['test', '', [['a', 4], ['stuff', 'there'], ['d', 5]]]]
[0]:
['test', '', [['a', 4], ['stuff', 'there'], ['d', 5]]]
- args: ''
- kwargs: [['a', 4], ['stuff', 'there'], ['d', 5]]
- a: 4
- d: 5
- stuff: 'there'
- name: 'test'
# parsing only args
test(1, 2, 3, hello)
[['test', [1, 2, 3, 'hello'], '']]
[0]:
['test', [1, 2, 3, 'hello'], '']
- args: [1, 2, 3, 'hello']
- kwargs: ''
- name: 'test'
# parsing both
test(1, 2, 3, hello, a=4, stuff=there, d=5)
^
FAIL: Expected ")", found ',' (at char 19), (line:1, col:20)>Exit code: 0
Most easily fixed with:
fxn_args = args + ',' + kwargs | pp.Optional(args, default='') + pp.Optional(kwargs, default='')
You might also find that identifiers are not just Word(alphas), but also '_' and numeric digits. There is an identifier expression in the pyparsing_common namespace class included with pyparsing:
ppc = pp.pyparsing_common
ident = ppc.identifier()
number = ppc.number()
number
will also do auto-convertion to int or float.