Search code examples
pythonregexnested-groups

How to get nested-groups with regexp


I need your help with following regex. I have a text

"[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."

using regex I want to get

[Hello|Hi]
[inviting | calling]
[[junior| mid junior]|senior]

the following rexeg (\[[^\[$\]\]]*\])

gives me [Hello|Hi] [inviting | calling] [junior| mid junior]

so how should I fix it to get correct output?


Solution

  • Let's define your string and import re:

    >>> s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
    >>> import re
    

    Now, try:

    >>> re.findall(r'\[ (?:[^][]* \[ [^][]* \])* [^][]*  \]', s, re.X)
    ['[Hello|Hi]', '[inviting | calling]', '[[junior| mid junior]|senior]']
    

    In more detail

    Consider this script:

    $ cat script.py
    import re
    s = "[Hello|Hi]. We are [inviting | calling] you at position [[junior| mid junior]|senior] developer."
    
    matches = re.findall(r'''\[       # Opening bracket
            (?:[^][]* \[ [^][]* \])*  # Zero or more non-bracket characters followed by a [, followed by zero or more non-bracket characters, followed by a ]
            [^][]*                    # Zero or more non-bracket characters
            \]                        # Closing bracket
            ''',
            s,
            re.X)
    print('\n'.join(matches))
    

    This produces the output:

    $ python script.py
    [Hello|Hi]
    [inviting | calling]
    [[junior| mid junior]|senior]