Search code examples
pythonpython-re

How do I split a string and keep the separators using python re library?


Here is my code:

import re
string = r"('Option A' | 'Option B') & ('Option C' | 'Option D')"
word_list = re.split(r"[\(.\)]", string)
-> ['', "'Option A' | 'Option B'", ' & ', "'Option C' | 'Option D'", '']

I want the following result:

-> ["('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')"]

Solution

  • You can use re.findall to capture each parenthesis group:

    import re
    string = r"('Option A' | 'Option B') & ('Option C' | 'Option D')"
    pattern = r"(\([^\)]+\))"
    re.findall(pattern, string)
    # ["('Option A' | 'Option B')", "('Option C' | 'Option D')"]
    

    This also works with re.split

    re.split(pattern, string)
    # ['', "('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')", '']
    

    If you want to remove empty elements from using re.split you can:

    [s for s in re.split(pattern, string) if s]
    # ["('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')"]
    

    How the pattern works:

    • ( begin capture group
    • \( matches the character ( literally
    • [^\)]+ Match between one and unlimited characters that are not )
    • \) matches the character ) literally
    • ) end capture group