Search code examples
pythonregexregex-group

Regular expression to identify groups


content = "[row1|col1]:{value:{{calculate}}<<report_1>>[Row2|col2];format:float;} [hiddenr0120a|c0012]:{format:float;}"
regex = re.compile(r"(\[.*?\]).*?\{(.*?)\}")
entries = regex.findall(content)
#Output
[('[row1|col1]', 'value:{{calculate'), ('[Row2|col2]', 'format:float;')]
#Expected Output
[("[row1|col]", "{value:{{calculate}}<<report_1>>[Row2|col2];format: float;}"), ("[hiddenr0120a|c0012]", "{format:float;}")]

I have tried regex "([.?]).?{(.*?)}" which actually ends at the first instance of "}", so it fails for the first scenario.


Solution

  • You might use

    (\[[^\[\]\r\n]*\]):({.*?})(?: (?=\[)|$)
    

    In parts

    • ( Capture group 1
      • \[[^\[\]\r\n]*\] Match from opening till closing square bracket
    • ) Close group 1
    • : Match literally (Or use .*? to match other chars as well)
    • ( Capture group 2
      • {.*?} Match from { till } as least as possible
    • ) Close group 2
    • (?: Non capture group (?=[) Match a space and assert what is on the right is [
      • | Or
      • $ Assert the end of the string
    • ) close non capture group

    Regex demo | Python demo

    Example code

    import re
     
    content = "[row1|col1]:{value:{{calculate}}<<report_1>>[Row2|col2];format:float;} [hiddenr0120a|c0012]:{format:float;}"
    regex = re.compile(r"(\[[^\[\]\r\n]*\]):({.*?})(?: (?=\[)|$)")
    entries = regex.findall(content)
    print (entries)
    

    Output

    [('[row1|col1]', '{value:{{calculate}}<<report_1>>[Row2|col2];format:float;}'), ('[hiddenr0120a|c0012]', '{format:float;}')]