Python 3
read a string from the file:
with open(filepath, "r", encoding="utf-8") as f:
content_string = f.read()
It looks line this:
---
section-1-line-1
section-1-line-2
section-1-line-3
---
section-2-line-1
section-2-line-2
section-2-line-3
---
section-3-line-1
section-3-line-2
section-3-line-3
---
I need to remove entire section that contains line section 2 line 2
So the end result should be
---
section-1-line-1
section-1-line-2
section-1-line-3
---
section-3-line-1
section-3-line-2
section-3-line-3
---
So I create regexp:
rx = re.compile(r'---[^-{3}]+section-2-line-2[^-{3}]+---', re.S)
content_string_modified = re.sub(rx, '', content_string)
This regexp above does nothing, i.e. does not match. If I remove the closing ---
from the regex (r'---[^-{3}]+section-2-line-2[^-{3}]+'
) it matches partially - it finds starting negative class but does not use the quantifier of the closing negative class, i.e. ignores {3}
and stops at the first dash, not at the first three dashes, so it leaves a chunk of section that needs to be removed:
---
section-1-line-1
section-1-line-2
section-1-line-3
-2-line-3
---
section-3-line-1
section-3-line-2
section-3-line-3
---
Why? How to make both starting and ending [^-{3}]+
to work? Thanks!
You cannot exclude matching of complex string with symbol class, but you can do it with negative lookaheads.
For example, (?:(?!---).)*
will match everything, what is not exactly three dashes.
Your full regex will be
---(?:(?!---).)*section-2-line-2.*?(?=---)
Notice, that you don't need lookaheads after your search phrase, as simple lazy quantifier is enough there.
Demo here.
Also, notice, that you shouldn't use re.sub
, if you already compiled your regex.
rx = re.compile(r'---(?:(?!---).)*section-2-line-2.*?(?=---)', re.S)
content_string_modified = rx.sub('', content_string)
Demo of code here.