Search code examples
powershell

Combining -Match and -NotMatch, find lines with any "foo" but not match "foo bar", how?


I've been analyzing log file using PowerShell by searching for any lines with "error" or "exception" by using foreach { $_ -match $searchStrings } and this works.

However I ran into logfiles that also include many lines with user login error and user authentication error which I want to ignore, but I can't get it to work.

When I try the -Match and -NotMatch separately they work as expected. But when I combine them I get just one result true. I've also tried using Where-Object but it just returns all lines including the lines without "error".

So the logfile is for example like this:

28-02-2024 02:30:37 - User authentication error ('JohnsenJ') on module LabResultCodeLists
28-02-2024 10:14:27 - LabResults imported (record count: 0)
28-02-2024 10:16:53 - LabResults imported (record count:6754)
28-02-2024 13:19:03 - Server error from remote client (45eaae02-b53b-40b4-ac7f-79926eac12c4)
28-02-2024 22:34:00 - Server error from remote client (fff80a18-8317-4945-9e12-b1cacc4b4419)
29-02-2024 00:59:31 - Query error ('SELECT * FROM LABRESULTS WHERE CODE = ;')
29-02-2024 01:29:51 - LabResults imported (record count:123)
29-02-2024 01:54:35 - Access violation error at dlgLabResultVerification
29-02-2024 15:30:14 - Query error ('SELECT * FROM LABRESULTS WHERE CODE = ;')
29-02-2024 17:09:06 - User login error ('JohnsneJ') not found
29-02-2024 18:59:29 - Connection error from remote client (45eaae02-b53b-40b4-ac7f-79926eac12c4)

And the PowerShell script is like this.

$searchStrings = "error|exception"
$excludeStrings = "User login error|User authentication error"

$file = "logfile.txt"

$lines = Get-Content $file -ReadCount 1000 |
    foreach { $_ -match $searchStrings } # count 8
    #foreach { $_ -NotMatch $excludeStrings } # count 9 
    #foreach { $_ -match $searchStrings -and $_ -NotMatch $excludeStrings} # 1: true ?!
    #Where-Object { $_ -Match $searchStrings } # incorrect: count 11 

# Any lines?
if ($lines.count -ne 0) {
    Write-Host ("Error count: $($lines.count)")
    Write-Host ($lines -join "`n")
}

How can I find just the lines that match $searchStrings but exclude any that match $excludeStrings?


Solution

  • Let's first analyze your existing code and why it behaves as it does:

    $lines = Get-Content $file -ReadCount 1000 |
        foreach { $_ -match $searchStrings } # count 8
    

    When Get-Content is used with parameter -ReadCount 1000 it reads as much as 1000 lines (or less, if the file is shorter). Then it passes these lines all at once to the next command in the pipeline (without -ReadCount it would pass the lines one-by-one). So inside the foreach code block the automatic $_ variable is an array of strings, instead of a simple string variable.

    Why is this important? Because in PowerShell most of the operators work differently, depending on whether the operand on the left hand side (LHS) is a collection (like an array) or a scalar. In the case of a collection, instead of resolving to a boolean value, the operator acts as a filter that outputs only the elements of the collection that match (see about_Comparison_Operators - Common Features).

    So why does combining the -match and -NotMatch operators in the next example doesn't work as expected then?

    $lines = Get-Content $file -ReadCount 1000 |
        foreach { $_ -match $searchStrings -and $_ -NotMatch $excludeStrings} 
    

    Due to operator precedence, PowerShell processes the expression inside foreach in this order:

    1. $temp1 = $_ -match $searchStrings
    2. $temp2 = $_ -NotMatch $excludeStrings
    3. $temp1 -and $temp2

    As I've already explained, $temp1 and $temp2 will be arrays of lines that match or don't match. What happens when an array is used in a boolean context as in $temp1 -and $temp2? The array resolves to $false if it is empty or $true if it is non-empty. As both arrays are non-empty, you get the result of $true.

    Solution A - minimal changes

    To fix your existing code with minimal changes, just chain the -match and -notmatch operators, without using a boolean operator:

    $lines = Get-Content $file -ReadCount 1000 |
        foreach { $_ -match $searchStrings -notmatch $excludeStrings }
    

    PowerShell first resolves $_ -match $searchStrings to an array of matching elements, then feeds this filtered array to the -notmatch sub expression to filter it again, excluding the unwanted elements.

    Solution B - idiomatic

    While Solution A works, it's not idiomatic PowerShell code. Unless there is a very good reason to use -ReadCount (e. g. if performance is paramount), I'd use the more expressive Where-Object filtering pattern instead:

    $lines = Get-Content $file |
        Where-Object { $_ -match $searchStrings -and $_ -notmatch $excludeStrings }
    

    Note that in this case using the boolean operator -and is correct, since without -ReadLines, the Get-Content command passes each line one-by-one to the next pipeline command. So $_ within the Where-Object filter expression refers to a single string object only and the -match and -notmatch sub expressions both resolve to a boolean value, which results in the expected outcome when combined using -and.