I have a text like
var12.1
a
a
dsa
88
123!!!
secondVar12.1
The string between var
and secondVar
may be different (and there may be different count of them).
How can I dump it with regexp?
I'm trying something something like this to no avail:
re.findall(r"^var[0-9]+\.[0-9]+[\n.]+^secondVar[0-9]+\.[0-9]+", str, re.MULTILINE)
You can grab it with:
var\d+(?:(?!var\d).)*?secondVar
See demo. re.S
(or re.DOTALL
) modifier must be used with this regex so that .
could match a newline. The text between the delimiters will be in Group 1.
NOTE: The closest match will be matched due to (?:(?!var\d).)*?
tempered greedy token (i.e. if you have another var
+ a digit after var
+ 1+ digits then the match will be between the second var
and secondVar
.
NOTE2: You might want to use \b
word boundaries to match the words beginning with them: \bvar(?:(?!var\d).)*?\bsecondVar
.
REGEX EXPLANATION
var
- match the starting delimiter\d+
- 1+ digits(?:(?!var\d).)*?
- a tempered greedy token that matches any char, 0 or more (but as few as possible) repetitions, that does not start a char sequence var
and a digitsecondVar
- match secondVar
literally.import re
p = re.compile(r'var\d+(?:(?!var\d).)*?secondVar', re.DOTALL)
test_str = "var12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1\nvar12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1"
print(p.findall(test_str))
Result for the input string (I doubled it for demo purposes):
['12.1\na\na\ndsa\n\n88\n123!!!\n', '12.1\na\na\ndsa\n\n88\n123!!!\n']