Search code examples
functionparsingargumentspyparsing

pyparsing - Parsing a function call to get function name and argument list


I have some complicated function calls which i want to parse to get function name and argument list. examples of the function calls are below:

1) extend(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')

2) restrict(lambda x:x !=0, ['col'])

I have tried to parse this using regex but so far it fails to parse the argument list properly. i am new to pyparsing so any help is appreciated.


Solution

  • Those are some pretty complicated argument lists in those functions. So if you are trying to parse them, this will be a pretty big job.

    However, if you just want the function name, and a string of the arguments passed in, then with pyparsing, you can use some easy short-cuts.

    If you keep your plan very high-level, you can write your BNF as:

    function_call ::= identifier '(' arguments ')'
    idenntifier ::= word starting with alpha or '_', followed by zero or more alphanums or '_'
    arguments ::= (let's not worry about this for the moment)
    

    If we just consider the arguments as a list of items that may be nested in one or more levels of parentheses, then we can use pyparsing's nestedExpr helper to capture them.

    import pyparsing as pp
    
    identifier = pp.Word('_' + pp.alphas, '_' + pp.alphanums)
    arg_list = pp.nestedExpr()  # nesting delimiters default to '(' and ')'
    function_call = identifier("name") + arg_list("args")
    
    tests = """\
        extend(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')
        restrict(lambda x:x !=0, ['col'])"""
    
    function_call.runTests(tests)
    

    Prints:

    extend(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')
    ['extend', ['lambda', 'x:', "'xxxx'", 'if', 't=', "'xx'", 'else', 't.replace', ["'a'", ',', "''"], '.replace', ["'b'", ',', "''"], ',', '[', "'col_name1'", '],', "'col_name'", ',', "'string'"]]
    - args: [['lambda', 'x:', "'xxxx'", 'if', 't=', "'xx'", 'else', 't.replace', ["'a'", ',', "''"], '.replace', ["'b'", ',', "''"], ',', '[', "'col_name1'", '],', "'col_name'", ',', "'string'"]]
      [0]:
        ['lambda', 'x:', "'xxxx'", 'if', 't=', "'xx'", 'else', 't.replace', ["'a'", ',', "''"], '.replace', ["'b'", ',', "''"], ',', '[', "'col_name1'", '],', "'col_name'", ',', "'string'"]
        [0]:
          lambda
        [1]:
          x:
        [2]:
          'xxxx'
        ... every word in the args broken out separately
        [16]:
          ,
        [17]:
          'string'
    - name: 'extend'
    
    
    restrict(lambda x:x !=0, ['col'])
    ['restrict', ['lambda', 'x:x', '!=0,', '[', "'col'", ']']]
    - args: [['lambda', 'x:x', '!=0,', '[', "'col'", ']']]
      [0]:
        ['lambda', 'x:x', '!=0,', '[', "'col'", ']']
    - name: 'restrict'
    

    If you just want the args list as a string, then you can wrap in pyparsing's originalTextFor. Change arg_list to:

    arg_list = pp.originalTextFor(pp.nestedExpr())
    

    Now rerunning the tests gives:

    extend(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')
    ['extend', "(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')"]
    - args: "(lambda x: 'xxxx' if t='xx' else t.replace('a','').replace('b',''), ['col_name1'], 'col_name', 'string')"
    - name: 'extend'
    
    
    restrict(lambda x:x !=0, ['col'])
    ['restrict', "(lambda x:x !=0, ['col'])"]
    - args: "(lambda x:x !=0, ['col'])"
    - name: 'restrict'
    

    If you want to remove the opening and closing ()'s, and parse out the delimiting commas, that is an exercise left for the reader/OP. (If this is the case, you may want to go back to the first version, and work with the parsed-out bits of the arg list instead of the all-one-string version.)