For example, I have this texte (without newline - it's important) :
<div> ffjdklfjdklfjs 2015 ddddd </div> sfsfsfsfsf <div> hkh/ <> -%=:;.éggggggggggg 2018 dsqkdlmqs </div> fdfdfd </div><div> ffjdklfjdklfjs 2023 ddddd </div> sfsfsfsfsf <div> hkh/ <> -%=:;.éjhjk 2018 / dsqkdlmqs </div> fdfdfd </div>
I'll would like a regex in order to find all sequence with only texte between <div>...2018....</div>
for only 2018 date and not others.
The resultat must be 2 matchs :
<div>hkh/ <> -%=:;.éggggggggggg 2018 dsqkdlmqs </div>
<div>hkh/ <> -%=:;.éjhjk 2018 / dsqkdlmqs </div>
I made this regex (I code with Python) :
r"<div>(?=.*?2018).*?<\/div>" /g
But It doesn't work. The result is 4 matchs :
<div> ffjdklfjdklfjs 2015 ddddd </div>
<div> hkh/ <> -%=:;.éggggggggggg 2018 dsqkdlmqs </div>
<div> ffjdklfjdklfjs 2023 ddddd </div>
<div> hkh/ <> -%=:;.éjhjk 2018 / dsqkdlmqs </div>
I don't want to select <div> ffjdklfjdklfjs 2015 ddddd </div>
and not <div> ffjdklfjdklfjs 2023 ddddd </div>
but I don't find the solution :(
try this code:
import re
text = """<div> ffjdklfjdklfjs 2023 ddddd </div> sfsfsfsfsf
<div> hkhjhjk 2018 / dsqkdlmqs </div> fdfdfd </div>"""
result = re.search(r'(?=<div)(?=.*?2018)[\s\S]*?(?:<\/div>)', text)
print(result[0]) # <div> hkhjhjk 2018 / dsqkdlmqs </div>
=================
Edit:
import re
text1 = """<div> ffjdklfjdklfjs 2023 ddddd </div> sfsfsfsfsf
<div> hkhjhjk 2018 / dsqkdlmqs </div> fdfdfd </div>"""
text2 = """<div> ffjdklfjdklfjs 2023 ddddd </div> sfsfsfsfsf
<div> hk
hjhjk 2018 / dsqkdlmqs </div> fdfdfd </div>"""
reg = re.compile(r'<div>(?=[^<]*?2018)[\s\S]*?<\/div>')
result1 = reg.search(text1)
print(result1[0]) # <div> hkhjhjk 2018 / dsqkdlmqs </div>
result2 = reg.search(text2)
print(result2[0]) # <div> hk\nhjhjk 2018 / dsqkdlmqs </div>