I have a file with following content:
#### v2
START MATCH
Text explaning things and stuff.
This has to be matched.
END MATCH
#### v1
Do not match this part (or anything
below "END MATCH" part).
#### v0
Do not match this either.
I'm trying to match everything between START MATCH
and END MATCH
(including new lines). However, the text below END MATCH
might not exist, instead it can be the end of file. Also, there is no literal END MATCH
text, it's just a marker to show what I'm trying to achieve.
I was trying out the following regex pattern (?<=# v.\n\n)(.|\n)*(?=\n(?:#.*?|$))
which seems to me fine if the file ends with END MATCH
, but if there are additional lines below it (starting with new line and #
character), my pattern captures that part as well.
How can I modify my pattern (probably just the last part (?=\n(?:#.*?|$))
?) to exclude everything after END MATCH
?
Example can be tested here: https://regex101.com/r/bJAZZq/1
You can use
#\h+v\d+\R{2}\K(?s:.*?)(?=\R#|\z)
See the regex demo.
Details
#
- a #
char\h+
- one or more horizontal whitespacesv
- a v
letter\d+
- one or more digits\R{2}
- two line break sequences\K
- omit the text matched so far(?s:.*?)
- any zero or more chars as few as possible(?=\R#|\z)
- up to the first occurrence of a line break sequence and then #
or end of string.Please note that (.|\n)*
is a very bad regex construct consuming a lot of computational resources and leading to performance issues, you should never use it.
\R
construct is very useful in PCRE and similar regex engines, it matches any kind of line breaks, \r
, \n
, \r\n
and even more sometimes depending on the options or exact library implementation.
\K
allows to use +
and *
quantifiers before the text you want to actually grab with the regex, unlike the lookbehinds, where the pattern length must be fixed.