Search code examples
pyparsing

pyparsing: split chain of method calls into to top-level parts


If I have a chain of method calls in Python, how do I extract the top level calls using pyparsing?

Tldr; the function should behave thusly:

_parse_commands("df.hi()[['fi']](__call__).NI(ni='NI!')")
['df', '.hi()', "[['fi']]", '(__call__)', ".NI(ni='NI!')"]

I have not even been able to parse a method call properly:

from pyparsing import Word, alphas, nums, Literal, alphanums, printables, Optional, locatedExpr, originalTextFor, SkipTo

identifier = Word(alphas + '_', alphanums + '_').setName("identifier")
lparen = Literal("(")
rparen = Literal(")")
function_call = identifier + lparen + Optional(printables) + rparen

function_call.parseString("hi()")
# (['hi', '(', ')'], {})
# but
function_call.parseString("hi(ho)")
# ...
# ParseException: Expected ")" (at char 3), (line:1, col:4)

A problem is that I cannot seem to find any way to tell pyparsing to "fetch me anything between the delimiters" - this is what I am attempting with printables above. I also have tried originalTextFor to solve the same problem.

Also, if the answer could use locatedExpr to give the locations of the function calls, that would be swell.


Solution

  • Actually parsing these expressions will not be trivial, since you will need to pretty much define any kind of Python expression.

    But since you want to just split on the nested parentheses, then you can use the pyparsing builtin nestedExpr() (which defaults to expressions of nested ()'s), and use scanString to scan the input string for matches. Each match returns a tuple of the tokens, start, and end location. By keeping track of the last seen end, then when there is a match you can reconstruct the intervening text by slicing from last_end to current start:

    src = "df.hi()[['fi']](__call__).NI(ni='NI!')"
    import pyparsing as pp
    
    last_e = 0
    for t, s, e in pp.nestedExpr().scanString(src):
        print(src[last_e:s])
        print(s)
        print(t.asList())
        print(src[s:e])
        print(e)
        print()
        last_e = e
    
    # get whatever is left after the last parens
    print(src[last_e:])
    

    Prints:

    df.hi
    5
    [[]]
    ()
    7
    
    [['fi']]
    15
    [['__call__']]
    (__call__)
    25
    
    .NI
    28
    [['ni=', "'NI!'"]]
    (ni='NI!')
    38
    

    From here you should be able to get the bits you want.