I have multiple strings like
string1 = """[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''"""
string2 = """[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]"""
string3 = """[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]"""
strings = [string1, string2, string3]
Every string does contain one or more "[br]"s.
Each string may or may not include annotations.
Every annotation starts with "[*" and ends with "]". It may include double brackets("[[" and "]]"), but never single ones("[" and "]"), so there won't be any confusion (e.g. [* some annotation with [[brackets]]]).
The words I want to replace are the words between the first "[br]" and the annotation(if any exists, otherwise, the end of the string), which are
word1 = """팔짱낄 공''':'''"""
word2 = """낟알 과'''-'''"""
word3 = """둘레 곽[br]클 확"""
So I tried
for string in strings:
print(re.sub(r"\[br\](.)+?(\[\*)+", "AAAA", string))
expecting something like
[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]
The logic for the regex was
\[br\]
: the first "[br]"
(.)+?
: one or more characters that I want to replace, lazy
(\[\*)+
: one or more "[*"s
But the result was
[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''
[[顆|{{{#!html}}}]]AAAA some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]]AAAA another annotation.][* another annotation.]
instead. I also tried r"\[br\](.)+?(\[\*)*"
but still not working. How can I fix this?
You could use
^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)
The pattern matches
^
Start of string(.*?\[br])
Capture group 1, match as least as possible chars until the first occurrence of [br]
.+?
Match any char 1+ times(?=
Positive lookahead, assert at the right
\[\*.*?](?<!].)(?!])
Match [*
till ]
not surrounded by ]
|
Or$
Assert end of string)
Close lookaheadReplace with capture group 1 and AAAA
like \1AAAA
Example code
import re
pattern = r"^(.*?\[br]).+?(?=\[\*.*?](?<!].)(?!])|$)"
s = ("[[拱|{{{#!html}}}]][br]팔짱낄 공''':'''\n"
"[[顆|{{{#!html}}}]][br]낟알 과'''-'''[* some annotation that may include quote marks(', \") and brackets(\"(\", \")\", \"[[\", \"]]\").]\n"
"[[廓|{{{#!html}}}]][br]둘레 곽[br]클 확[* another annotation.][* another annotation.]")
subst = "$1AAAA"
result = re.sub(pattern, r"\1AAAA", s, 0, re.MULTILINE)
print(result)
Output
[[拱|{{{#!html}}}]][br]AAAA
[[顆|{{{#!html}}}]][br]AAAA[* some annotation that may include quote marks(', ") and brackets("(", ")", "[[", "]]").]
[[廓|{{{#!html}}}]][br]AAAA[* another annotation.][* another annotation.]