Search code examples
pythonregexregular-language

Matching newline and any character with Python regex


I have a text like

var12.1
a
a
dsa

88
123!!!
secondVar12.1

The string between var and secondVar may be different (and there may be different count of them).

How can I dump it with regexp?
I'm trying something something like this to no avail:

re.findall(r"^var[0-9]+\.[0-9]+[\n.]+^secondVar[0-9]+\.[0-9]+", str, re.MULTILINE)

Solution

  • You can grab it with:

    var\d+(?:(?!var\d).)*?secondVar
    

    See demo. re.S (or re.DOTALL) modifier must be used with this regex so that . could match a newline. The text between the delimiters will be in Group 1.

    NOTE: The closest match will be matched due to (?:(?!var\d).)*? tempered greedy token (i.e. if you have another var + a digit after var + 1+ digits then the match will be between the second var and secondVar.

    NOTE2: You might want to use \b word boundaries to match the words beginning with them: \bvar(?:(?!var\d).)*?\bsecondVar.

    REGEX EXPLANATION

    • var - match the starting delimiter
    • \d+ - 1+ digits
    • (?:(?!var\d).)*? - a tempered greedy token that matches any char, 0 or more (but as few as possible) repetitions, that does not start a char sequence var and a digit
    • secondVar - match secondVar literally.

    IDEONE DEMO

    import re
    p = re.compile(r'var\d+(?:(?!var\d).)*?secondVar', re.DOTALL)
    test_str = "var12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1\nvar12.1\na\na\ndsa\n\n88\n123!!!\nsecondVar12.1"
    print(p.findall(test_str))
    

    Result for the input string (I doubled it for demo purposes):

    ['12.1\na\na\ndsa\n\n88\n123!!!\n', '12.1\na\na\ndsa\n\n88\n123!!!\n']