I'm trying to create a regular expression to match a pattern of the form:
content = "identifier{ {{var1}} rest of the content} outer content identifier{ {{var2}} another content} identifier{ content with no vars }"
so supposedly I run re.findall
on content
the return value should be:
["{{var1}} rest of the content", "{{var2}} another content", "content with no vars"]
the pattern I want to match is identifier\{.*?\}
but I don't know how to make it work as the enclosing pattern is included in the text I want to match so it either matches before the required place when I make it stingy, or it will merge the two patterns with each other when greedy, which is something I don't want.
If you want to allow multiple occurrences of {{var}}
and not allow any other curly's:
{([^{}]*(?:{{[^{}]*}}[^{}]*)*)}
The pattern matches:
{
Match literally(
Capture group 1
[^{}]*
Match optional chars other than {
and }
(?:
Non capture group
{{
Match literally[^{}]*
atch optional chars other than {
and }
}}
Match literally[^{}]*
Match optional chars other than {
and }
)*
Close the non capture group and optionally repeat it)
Close group 1}
Match literallyTo remove the leading and trailing whitespace you can use strip:
import re
pattern = r"{([^{}]*(?:{{[^{}]*}}[^{}]*)*)}"
content = "identifier{ {{var1}} rest of the content} outer content identifier{ {{var2}} another content} identifier{ content with no vars }"
print([s.strip() for s in re.findall(pattern, content)])
Output
['{{var1}} rest of the content', '{{var2}} another content', 'content with no vars']
If you want the capture group values without the surrounding whitespaces, and allow only a single opening and closing curly, you can use non greedy quantifiers with negative lookarounds:
(?<!{){\s*([^{}]*?(?:{{[^{}]+}}[^{}]*?)*)\s*}(?!})