Regex pattern for string - python

I would like to group string in this format:

Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5

I would like to group it from BEGIN to END with it, like that:

Some_text Some_text 1 2 3
<!-- START -->
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END <!-- END --> Some_text

Some_Text Some_text 1 4 5

 and  - this is just a comment on the start and end of grouping. I want to get only text between BEGIN and END

I have something like that, but it doesn't work for every case - when there is a lot of data, it just doesn't work:

reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")

text is my string and then after grouping I exchange it for a list - I don't know how to make this regex directly from the list, then I would not have to do it on string text but on python list text

Give me some tips or help how I can solve it.

Sample code:

import re
text="Some_text Some_text 1 2 3\nBEGIN Some_text Some_text\n44 76 1321\nSome_text Some_text\nEND Some_text\nSome_Text Some_text 1 4 5"

begin = "BEGIN"
end = "END"
reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")

print(lines)

It works but I don't know why sometimes it doesn't, when it takes a lot of text e.g: 20k words I want to get only text between BEGIN and END

Solution

You might use

^BEGIN\b(.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*)\r?\nEND

Regex demo | Python demo

If you want to include BEGIN and END, you can omit the capturing group

^BEGIN\b.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*\r?\nEND

Regex demo | Python demo

Code example

import re

regex = r"^BEGIN\b(.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*)\r?\nEND"

test_str = ("Some_text Some_text 1 2 3\n"
    "BEGIN Some_text Some_text\n"
    "44 76 1321\n"
    "Some_text Some_text\n"
    "END Some_text\n"
    "Some_Text Some_text 1 4 5\n")

print(re.findall(regex, test_str, re.MULTILINE))

Output

[' Some_text Some_text\n44 76 1321\nSome_text Some_text']