regex dealing with brackets

I have multiple strings like

string1 = """[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''"""
string2 = """[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]""" 
string3 = """[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]"""
strings = [string1, string2, string3]

Every string does contain one or more "[br]"s.

Each string may or may not include annotations.

Every annotation starts with "[*" and ends with "]". It may include double brackets("[[" and "]]"), but never single ones("[" and "]"), so there won't be any confusion (e.g. [* some annotation with [[brackets]]]).

The words I want to replace are the words between the first "[br]" and the annotation(if any exists, otherwise, the end of the string), which are

word1 = """팔짱낄 공''':'''"""
word2 = """낟알 과'''-'''"""
word3 = """둘레 곽[br]클 확"""

So I tried

for string in strings:
    print(re.sub(r"\[br\](.)+?(\[\*)+", "AAAA", string))

expecting something like

[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]

The logic for the regex was

\[br\] : the first "[br]"

(.)+? : one or more characters that I want to replace, lazy

(\[\*)+ : one or more "[*"s

But the result was

[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''
[[顆|{{{#!html}}}]]AAAA some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]]AAAA another annotation.][* another annotation.]

instead. I also tried r"\[br\](.)+?(\[\*)*" but still not working. How can I fix this?

Solution

You could use

^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)

The pattern matches

^ Start of string
(.*?\[br]) Capture group 1, match as least as possible chars until the first occurrence of [br]
.+? Match any char 1+ times
(?= Positive lookahead, assert at the right
- \[\*.*?](?<!].)(?!]) Match [* till ] not surrounded by ]
- | Or
- $ Assert end of string
) Close lookahead

Replace with capture group 1 and AAAA like \1AAAA

Regex demo | Python demo

Example code

import re

pattern = r"^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)"

s = ("[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''\n"
            "[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', \") and brackets(\"(\", \")\", \"[[\", \"]]\").]\n"
            "[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]")

subst = "$1AAAA"
result = re.sub(pattern, r"\1AAAA", s, 0, re.MULTILINE)
print(result)

Output

[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]