Search code examples
pythonregexpython-re

re match text enclosed in { } where text may contain {{var}}


I'm trying to create a regular expression to match a pattern of the form:

content = "identifier{ {{var1}} rest of the content} outer content identifier{ {{var2}} another content} identifier{ content with no vars }"

so supposedly I run re.findall on content the return value should be:

["{{var1}} rest of the content", "{{var2}} another content", "content with no vars"]

the pattern I want to match is identifier\{.*?\} but I don't know how to make it work as the enclosing pattern is included in the text I want to match so it either matches before the required place when I make it stingy, or it will merge the two patterns with each other when greedy, which is something I don't want.


Solution

  • If you want to allow multiple occurrences of {{var}} and not allow any other curly's:

    {([^{}]*(?:{{[^{}]*}}[^{}]*)*)}
    

    The pattern matches:

    • { Match literally
    • ( Capture group 1
      • [^{}]* Match optional chars other than { and }
      • (?: Non capture group
        • {{ Match literally
        • [^{}]* atch optional chars other than { and }
        • }} Match literally
        • [^{}]* Match optional chars other than { and }
      • )* Close the non capture group and optionally repeat it
    • ) Close group 1
    • } Match literally

    Regex demo

    To remove the leading and trailing whitespace you can use strip:

    import re
    
    pattern = r"{([^{}]*(?:{{[^{}]*}}[^{}]*)*)}"
    content = "identifier{ {{var1}} rest of the content} outer content identifier{ {{var2}} another content} identifier{ content with no vars }"
    
    print([s.strip() for s in re.findall(pattern, content)])
    

    Output

    ['{{var1}} rest of the content', '{{var2}} another content', 'content with no vars']
    

    If you want the capture group values without the surrounding whitespaces, and allow only a single opening and closing curly, you can use non greedy quantifiers with negative lookarounds:

    (?<!{){\s*([^{}]*?(?:{{[^{}]+}}[^{}]*?)*)\s*}(?!})
    

    Regex demo