Search code examples
regexpattern-matchingregex-groupregex-lookarounds

Regex select names in next lines after match (until...)


I have a text file with different levels (scructured by tabs) and I need to select certain values out of it. Here is an example. I tried this for a very long time, but I can't find any solution.

        Connection
            Match
                Fridolin
                Marten
    Connection
            Inventory
                        Fill Up
            Fill Up
        Match
            Peter
            Marcus
        Storage
                Room 1
                Room 2
                Room 3
            Match
                Albert
                Jonas
                Hans
    List
    Match
        Peter
        Marcus

I want to select every name in the following lines after "Match" (which has the same amount of tabs in front of it) until the next level (different amount of tabs) starts. In this case I want to select the names that are listed after the word "Match". Until (for example) "Connection" pops up and the amount of tabs in front of it (level) changes. The Names that follow "Match" are always on the same level. I can't use multiline for this.

            Match
                Fridolin
                Marten
    Connection
(?<=Match[\r\n]+\t\t?\t?\t?\t?)([ a-zA-ZäöüÄÖÜßé0-9\.-/\-])+

I have already this regex, which selects at least the first name that follows "Match". I don't know how to select the next names and stop if the level changes.


Solution

  • Try this:

    (?<=Match)\n(\s+)\w+(?:\n\1\w+)+
    

    online demo

    The regular expression matches as follows:

    Node Explanation
    (?<= look behind to see if there is:
    Match 'Match'
    ) end of look-behind
    \n '\n' (newline)
    ( group and capture to \1:
    \s+ whitespace (\n, \r, \t, \f, and " ") (1 or more times (matching the most amount possible))
    ) end of \1
    \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
    (?: group, but do not capture (1 or more times (matching the most amount possible)):
    \n '\n' (newline)
    \1 what was matched by capture \1
    \w+ word characters (a-z, A-Z, 0-9, _) (1 or more times (matching the most amount possible))
    )+ end of grouping