Search code examples
pythonregexstringsplittokenize

Split strings and capture groups in Python


I have the following string:

'Cc1cc([N+](=O)[O-])ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1'

and want to capture [N+] and [O-], that is, splitting and recovering them. I do not seem to be able to recover them by using re.split.

re.split(r'\[[^\]]*\]','Cc1cc([N+](=O)[O-])ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1')

output:
['Cc1cc(', '(=O)', ')ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1']

and I am looking for something like this:

['Cc1cc(', '[N+]','(=O)','[O-]', ')ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1']

I am aware of edits like: Splitting on regex without removing delimiters or In Python, how do I split a string and keep the separators?


Solution

  • If you apply the function re.split wrapping your function with parenthesis you get the desired output:

    s = 'Cc1cc([N+](=O)[O-])ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1'
    
    re.split('(\[[^\]]*\])',s)
    
    output : 
    ['Cc1cc(', '[N+]', '(=O)', '[O-]', ')ccc1OCC(C)(O)CN1CCN(Cc2ccccc2)CC1']