Search code examples
pythonregexstringsplitpython-re

Split concatenated functions keeping the delimiters


I am trying to split strings containing python functions, so that the resulting output keeps separate functions as list elements.
s='hello()there()' should be split into ['hello()', 'there()']
To do so I use a regex lookahead to split on the closing parenthesis, but not at the end of the string.

While the lookahead seems to work, I cannot keep the ) in the resulting strings as suggested in various posts. Simply splitting with the regex discards the separator:

import re
s='hello()there()'
t=re.split("\)(?!$)", s)

This results in: 'hello(', 'there()'] .

s='hello()there()'
t=re.split("(\))(?!$)", s)

Wrapping the separator as a group results in the ) being retained as a separate element: ['hello(', ')', 'there()'] As does this approach using the filter() function:

s='hello()there()'
u = list(filter(None, re.split("(\))(?!$)", s)))

resulting again in the parenthesis as a separate element: ['hello(', ')', 'there()']

How can I split such a string so that the functions remain intact in the output?


Solution

  • Use re.findall()

    1. \w+\(\) matches one or more word characters followed by an opening and a closing parenthesis> That part matches the hello() and there()
    t = re.findall(r"\w+\(\)", s)
    

    ['hello()', 'there()']
    

    Edition:
    .*? is a non-greedy match, meaning it will match as few characters as possible in the parenthesis.

    s = "hello(x, ...)there(6L)now()"
    
    t = re.findall(r"\w+\(.*?\)", s)       
    print(t)
    

    ['hello(x, ...)', 'there(6L)', 'now()']