I would like to group string in this format:
Some_text Some_text 1 2 3
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END Some_text
Some_Text Some_text 1 4 5
I would like to group it from BEGIN to END with it, like that:
Some_text Some_text 1 2 3
<!-- START -->
BEGIN Some_text Some_text
44 76 1321
Some_text Some_text
END <!-- END --> Some_text
Some_Text Some_text 1 4 5
<!-- START -->
and <!-- END -->
- this is just a comment on the start and end of grouping.
I want to get only text between BEGIN and END
I have something like that, but it doesn't work for every case - when there is a lot of data, it just doesn't work:
reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")
text is my string and then after grouping I exchange it for a list - I don't know how to make this regex directly from the list, then I would not have to do it on string text but on python list text
Give me some tips or help how I can solve it.
Sample code:
import re
text="Some_text Some_text 1 2 3\nBEGIN Some_text Some_text\n44 76 1321\nSome_text Some_text\nEND Some_text\nSome_Text Some_text 1 4 5"
begin = "BEGIN"
end = "END"
reg = re.compile(rf"{begin}[\-\s]+(.*)\n{end}", re.DOTALL)
core = re.search(reg, text).group(1)
lines = core.split("\n")
print(lines)
It works but I don't know why sometimes it doesn't, when it takes a lot of text e.g: 20k words I want to get only text between BEGIN and END
You might use
^BEGIN\b(.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*)\r?\nEND
If you want to include BEGIN and END, you can omit the capturing group
^BEGIN\b.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*\r?\nEND
Code example
import re
regex = r"^BEGIN\b(.*(?:\r?\n(?!(?:BEGIN|END)\b).*)*)\r?\nEND"
test_str = ("Some_text Some_text 1 2 3\n"
"BEGIN Some_text Some_text\n"
"44 76 1321\n"
"Some_text Some_text\n"
"END Some_text\n"
"Some_Text Some_text 1 4 5\n")
print(re.findall(regex, test_str, re.MULTILINE))
Output
[' Some_text Some_text\n44 76 1321\nSome_text Some_text']