Search code examples
javaregexregex-lookaroundsregex-greedycapture-group

Regex to capture text with unknown number of repeated groups in between


I'm trying to parse the number that follows "Dining:" in the following text, under SECOND LEVEL. So '666' should be returned.

    MAIN LEVEL
        Entrance: 11
        Dining: 33

    SECOND LEVEL
        Entrance: 4444
        Living: 5555
        Dining: 666

    THIRD LEVEL
        Dining: 999
        Kitchen: 000
        Family: 33332

If I use something like (?:\bDining:\s)(.*\b) then it captures the first occurrence under MAIN. I'm trying to therefore specify SECOND LEVEL in the regex, followed by a repeating pattern of: new lines, multiple spaces, and then any text, until Dining: is found. This demo illustrates the two problems I encounter. The regex used is: (?:\bSECOND\sLEVEL(\n\s+.*)*Dining:)(.*\b)

  1. A "Catastrophic backtracking" error appears until you delete the very last line containing Laundry: 1. Is this caused by too many matches or something?
  2. Once you delete that line, the regex captures only the last match under OTHER LEVEL .. returning '2' as opposed to the match under SECOND LEVEL.

Sometimes Dining: will not exist under SECOND LEVEL and therefore nothing should be returned.

What is a regex that will only capture the SECOND LEVEL's Dining: number, and if it doesn't exist then returns nothing? Straight up regex preferred, no looping in Java if possible. Thanks


Solution

  • Use a negative lookahead based regex.

    "(?m)^\\s*\\bSECOND LEVEL\\n(?:(?!\\n\\n)[\\s\\S])*\\bDining:\\s*(\\d+)"
    

    DEMO