Search code examples
regexpowershellselect-string

Matching an unknown number of multi-lines an unknown number of lines with Select-String in Powershell?


I've been able to match multiple lines; but only if I know how many lines are coming, and what the content of those lines are...

Select-String -Pattern "^Timestamp: 3/27/2021.*`n`n|^Message:.*errorText:" -Context 2 -LiteralPath .\SomeLog.log

Is there a way to match multiple lines without knowing what is in between?

for instance to match

[START]
...
...
[END]

I read something about changing the settings to the regex with (?sme) but it doesn't seem to work.

I was trying something like the following:

Select-String -Pattern '(sme?)\[START\].*\n(.*\n)\+\[END\]'


Solution

  • To make Select-String match multiline substrings:

    • You must provide the input as a single, multiline string, which is what Get-Content's -Raw switch provides.

    • As needed, in the regex passed to Select-String's -Pattern parameter, use inline regex option m (multi-line) to make ^ and $ match the beginning and end of each line ((?m)) and/or option s (single-line) to make . match newline characters ("`n") too ((?s)); you can activate both with (?sm).

    Here's an example with a multiline here-string serving as input, instead of, say,
    Get-Content -Raw file.txt:

    (@'
    before
    [START]
    ...1
    ...2
    [END]
    after
    [START]
    ...3
    ...4
    [END]
    done
    '@ | 
      Select-String  -AllMatches -Pattern '(?sm)^\[START\]$.+?^\[END\]$'
    ).Matches.Value -join "`n------------------------`n"
    

    Note: Strictly speaking, only [, not also ], requires escaping with \.

    If you only want to find the first block of matching lines, omit -AllMatches.Thanks, Wiktor Stribiżew.
    -AllMatches requests returning all matches per input string, and is normally - with line-by-line input - used to find multiple matches per line. Here, with a multiline input string, all (potentially multiline) matches inside it are returned.

    Output:

    [START]
    ...1
    ...2
    [END]
    ------------------------
    [START]
    ...3
    ...4
    [END]
    

    If you want to return only what is between the delimiter lines:

    (@'
    before
    [START]
    ...1
    ...2
    [END]
    after
    [START]
    ...3
    ...4
    [END]
    done
    '@ | 
      Select-String  -AllMatches -Pattern '(?sm)^\[START\]\r?\n(.*?)\r?\n\[END\]$'
    ).Matches.ForEach({ $_.Groups[1].Value }) -join "`n------------------------`n"
    

    Note: \r?\n matches both Windows-format CRLF and Unix-format LF-only newlines. Use \r\n / \n to match only the former / latter.

    Output:

    ...1
    ...2
    ------------------------
    ...3
    ...4