Search code examples
regexpowershellregex-groupregex-greedycapturing-group

PowerShell Regex: Capturing strings between two strings that is on multiple lines


I may have something like this:

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|B (newline)

FIRST|[some text here] (newline)
[insert text here] (newline)
SECOND|A (newline)

I only want to capture everything from FIRST to SECOND|B and exclude anything from FIRST to SECOND|A. The order in this post is just an example and may be different with the files I am working with. The text in brackets could be words, digits, special characters, etc. (newline) is just telling you that it is on a different line. I have tried https://regex101.com/r/CwzCyz/2 (FIRST[\s\S]+SECOND\|B) but that gives me from the first FIRST to the last SECOND|B This works in regex101.com but not in my PowerShell ISE application, which I am guessing is because I have the flavor set to PCRE(PHP).


Solution

  • FIRST\|(?:(?!SECOND\|[^B])[\S\s])*?SECOND\|B

    will not match the FIRST| associated with the SECOND|A (or any non-B)

    https://regex101.com/r/e0CG9B/1

    Expanded

     FIRST \| 
     (?:
          (?! SECOND \| [^B] )
          [\S\s] 
     )*?
     SECOND \| B
    

    If there is a need for the absolute inner FIRST / SECOND that has to be done a different way :

    FIRST\|(?:(?!(?:FIRST|SECOND)\|)[\S\s])*SECOND\|B

    https://regex101.com/r/qoT8U1/1