Search code examples
pythonpyparsing

pyparsing generic python function args and kwargs


I'm trying to create a parser for generic Python functions separating out args and kwargs. I've looked through the examples but couldn't find one that helps.

Here is an example of what I'd like to parse and what I'd like the output to be after I parse with parseString().asDict().

example = "test(1, 2, 3, hello, a=4, stuff=there, d=5)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}

or 
example = "test(a=4, stuff=there, d=5)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': '', 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}

or
example = "test(1, 2, 3, hello)"
results = xxx.parseString(example).asDict()
results
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': ''}

Both the arguments and keyword arguments should be optional and I'm ignoring for the moment super generic *args, **kwargs and input nested lists, etc. I managed to get something working when there are only args or kwargs but fails when I have both.

import pyparsing as pp

LPAR = pp.Suppress('(')
RPAR = pp.Suppress(')')

# define generic number
number = pp.Regex(r"[+-~]?\d+(:?\.\d*)?(:?[eE][+-]?\d+)?")

# define function arguments
arglist = pp.delimitedList(number | (pp.Word(pp.alphanums + '-_') + pp.NotAny('=')) )
args = pp.Group(arglist).setResultsName('args')

# define function keyword arguments
key = pp.Word(pp.alphas) + pp.Suppress('=')
values = (number | pp.Word(pp.alphas))
keyval = pp.dictOf(key, values)
kwarglist = pp.delimitedList(keyval)
kwargs = pp.Group(kwarglist).setResultsName('kwargs')

# build generic function
fxn_args = pp.Optional(args, default='') + pp.Optional(kwargs, default='')
fxn_name = (pp.Word(pp.alphas)).setResultsName('name')
fxn = pp.Group(fxn_name + LPAR + fxn_args + RPAR)

And the results

# parsing only kwargs
fxn.parseString('test(a=4, stuff=there, d=5)')[0].asDict()
{'name': 'test', 'args': '', 'kwargs': {'a': '4', 'stuff': 'there', 'd': '5'}}

# parsing only args
fxn.parseString('test(1, 2, 3, hello)')[0].asDict()
{'name': 'test', 'args': ['1', '2', '3', 'hello'], 'kwargs': ''}

# parsing both
fxn.parseString('test(1, 2, 3, hello, a=4, stuff=there, d=5)')[0].asDict()
...
ParseException: Expected ")", found ','  (at char 19), (line:1, col:20)

If I check parsing just the fxn_args, I get the kwargs simply missing altogether

# parse only kwargs
fxn_args.parseString('c=4, stuff=there, d=5.234').asDict()
{'args': '', 'kwargs': {'c': '4', 'stuff': 'there', 'd': '5.234'}}

# parse both args and kwargs
fxn_args.parseString('1, 2, 3, hello, c=4, stuff=there, d=5.234').asDict()
{'args': ['1', '2', '3', 'hello'], 'kwargs': ''}

Solution

  • If both args and kwargs are present, your parser is tripping over the ',' between them.

    You can see this for yourself using pyparsing's runTests method:

    fxn.runTests("""\
        # parsing only kwargs
        test(a=4, stuff=there, d=5)
    
        # parsing only args
        test(1, 2, 3, hello)
    
        # parsing both
        test(1, 2, 3, hello, a=4, stuff=there, d=5)
    """)
    

    Will print:

    # parsing only kwargs
    test(a=4, stuff=there, d=5)
    [['test', '', [['a', 4], ['stuff', 'there'], ['d', 5]]]]
    [0]:
      ['test', '', [['a', 4], ['stuff', 'there'], ['d', 5]]]
      - args: ''
      - kwargs: [['a', 4], ['stuff', 'there'], ['d', 5]]
        - a: 4
        - d: 5
        - stuff: 'there'
      - name: 'test'
    
    # parsing only args
    test(1, 2, 3, hello)
    [['test', [1, 2, 3, 'hello'], '']]
    [0]:
      ['test', [1, 2, 3, 'hello'], '']
      - args: [1, 2, 3, 'hello']
      - kwargs: ''
      - name: 'test'
    
    # parsing both
    test(1, 2, 3, hello, a=4, stuff=there, d=5)
                       ^
    FAIL: Expected ")", found ','  (at char 19), (line:1, col:20)>Exit code: 0
    

    Most easily fixed with:

    fxn_args =  args + ',' + kwargs | pp.Optional(args, default='') + pp.Optional(kwargs, default='')
    

    You might also find that identifiers are not just Word(alphas), but also '_' and numeric digits. There is an identifier expression in the pyparsing_common namespace class included with pyparsing:

    ppc = pp.pyparsing_common
    ident = ppc.identifier()
    number = ppc.number()
    

    number will also do auto-convertion to int or float.