Search code examples
powershellboolean-logicfindstr

How to use FINDSTR in PowerShell to find lines where all words in the search string match in any order


The following findstr.exe command almost does what I want, but not quite:

findstr /s /i /c:"word1 word2 word3" *.abc

I have used:

  • /s for searching all subfolders.
  • /c:

    Uses specified text as a literal search string

  • /i Specifies that the search is not to be case-sensitive.
  • *.abc Files of type abc.

The above looks for word1 word2 word3 as a literal, and therefore only finds the words in that exact order.

By contrast, I want all words to match individually, in any order (AND logic, conjunction).

If I remove /c: from the command above, then lines matching any of the words are returned (OR logic, disjunction), which is not what I want.

Can this be done in PowerShell?


Solution

  • You can use Select-String to do a regex based search through multiple files.

    To match all of multiple search terms in a single string with regular expressions, you'll have to use a lookaround assertion:

    Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$'
    

    In the above example, this is what's happening with the first command:

    Get-ChildItem -Filter *.abc -Recurse
    

    Get-ChildItem searches for files in the current directory
    -Filter *.abc shows us only files ending in *.abc
    -Recurse searches all subfolders

    We then pipe the resulting FileInfo objects to Select-String and use the following regex pattern:

    ^(?=.*\bword1\b)(?=.*\bword2\b)(?=.*\bword3\b).*$
    ^              # start of string  
     (?=           # open positive lookahead assertion containing
        .*         # any number of any characters (like * in wildcard matching)
          \b       # word boundary
            word1  # the literal string "word1"
          \b       # word boundary
     )             # close positive lookahead assertion
     ...           # repeat for remaining words
     .*            # any number of any characters
    $              # end of string
    

    Since each lookahead group is just being asserted for correctness and the search position within the string never changes, the order doesn't matter.


    If you want it to match strings that contain any of the words, you can use a simple non-capturing group:

    Get-ChildItem -Filter *.abc -Recurse |Select-String -Pattern '\b(?:word1|word2|word3)\b'
    
    \b(?:word1|word2|word3)\b
    \b          # start of string  
      (?:       # open non-capturing group
         word1  # the literal string "word1"
         |      # or
         word2  # the literal string "word2"
         |      # or
         word3  # the literal string "word3"
      )         # close positive lookahead assertion
    \b          # end of string
    

    These can of course be abstracted away in a simple proxy function.

    I generated the param block and most of the body of the Select-Match function definition below with:

    $slsmeta = [System.Management.Automation.CommandMetadata]::new((Get-Command Select-String))
    [System.Management.Automation.ProxyCommand]::Create($slsmeta)
    

    Then removed unnecessary parameters (including -AllMatches and -Pattern), then added the pattern generator (see inline comments):

    function Select-Match
    {
        [CmdletBinding(DefaultParameterSetName='Any', HelpUri='http://go.microsoft.com/fwlink/?LinkID=113388')]
        param(
            [Parameter(Mandatory=$true, Position=0)]
            [string[]]
            ${Substring},
    
            [Parameter(Mandatory=$true, ValueFromPipelineByPropertyName=$true)]
            [Alias('PSPath')]
            [string[]]
            ${LiteralPath},
    
            [Parameter(ParameterSetName='Any')]
            [switch]
            ${Any},
    
            [Parameter(ParameterSetName='Any')]
            [switch]
            ${All},
    
            [switch]
            ${CaseSensitive},
    
            [switch]
            ${NotMatch},
    
            [ValidateNotNullOrEmpty()]
            [ValidateSet('unicode','utf7','utf8','utf32','ascii','bigendianunicode','default','oem')]
            [string]
            ${Encoding},
    
            [ValidateNotNullOrEmpty()]
            [ValidateCount(1, 2)]
            [ValidateRange(0, 2147483647)]
            [int[]]
            ${Context}
        )
    
        begin
        {
            try {
                $outBuffer = $null
                if ($PSBoundParameters.TryGetValue('OutBuffer', [ref]$outBuffer))
                {
                    $PSBoundParameters['OutBuffer'] = 1
                }
    
                # Escape literal input strings
                $EscapedStrings = foreach($term in $PSBoundParameters['Substring']){
                    [regex]::Escape($term)
                }
    
                # Construct pattern based on whether -Any or -All was specified 
                if($PSCmdlet.ParameterSetName -eq 'Any'){
                    $Pattern = '\b(?:{0})\b' -f ($EscapedStrings -join '|')
                } else {
                    $Clauses = foreach($EscapedString in $EscapedStrings){
                        '(?=.*\b{0}\b)' -f $_
                    }
                    $Pattern = '^{0}.*$' -f ($Clauses -join '')
                }
    
                # Remove the Substring parameter argument from PSBoundParameters
                $PSBoundParameters.Remove('Substring') |Out-Null
    
                # Add the Pattern parameter argument
                $PSBoundParameters['Pattern'] = $Pattern
    
                $wrappedCmd = $ExecutionContext.InvokeCommand.GetCommand('Microsoft.PowerShell.Utility\Select-String', [System.Management.Automation.CommandTypes]::Cmdlet)
                $scriptCmd = {& $wrappedCmd @PSBoundParameters }
                $steppablePipeline = $scriptCmd.GetSteppablePipeline($myInvocation.CommandOrigin)
                $steppablePipeline.Begin($PSCmdlet)
            } catch {
                throw
            }
        }
    
        process
        {
            try {
                $steppablePipeline.Process($_)
            } catch {
                throw
            }
        }
    
        end
        {
            try {
                $steppablePipeline.End()
            } catch {
                throw
            }
        }
        <#
    
        .ForwardHelpTargetName Microsoft.PowerShell.Utility\Select-String
        .ForwardHelpCategory Cmdlet
    
        #>
    
    }
    

    Now you can use it like this, and it'll behave almost like Select-String:

    Get-ChildItem -Filter *.abc -Recurse |Select-Match word1,word2,word3 -All