Simplify the problem:
There is an article (long text)
Extract the content between start
(included) and end
(included)
Requirement: There cannot be more than one \n
between start
and end
Find all matches
Use python
re
only
For code:
lines = re.findall(pattern, text, re.DOTALL)
for line in lines:
print(line)
print('===')
So, how can I fixed my pattern?
What I try pattern
:
start[^\n]*\n?[^\n]*end
with text:...
start just me and python regex 1 end
start just me and python regex 2 end
start just me and python regex 3 end
...
wrong
:
start just me and python regex 1 end
start just me and python regex 2 end --> should be split with the line before
===
start just me and python regex 3 end
===
start(?:(?!\n\n).)*?end
and start(?:[^\n]|\n(?!\n))*?end
with text:start just
me and python
regex 1 end
start just me and python regex 2 end
start just me and python regex 3 end
wrong
:
start just
me and python
regex 1 end --> should not match this cause there is two `\n` in
===
start just me and python regex 2 end
===
start just me and python regex 3 end
===
you can use: start[^\n]*?\n?[^\n]*?end