Search code examples
powershellselect-string

Powershell, how to capture argument(s) of Select-String and include with matched output


Thanks to @mklement0 for the help with getting this far with answer given in Powershell search directory for code files with text matching input a txt file.

The below Powershell works well for finding the occurrences of a long list of database field names in a source code folder.

$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
  Select-String -Pattern (Get-Content $inputFile) | 
    Select-Object Path, LineNumber, line | 
      Export-csv $outputfile

However, many lines of source code have multiple matches, especially ADO.NET SQL statements with a lot of field names on one line. If the field name argument was included with the matching output the results will be more directly useful with less additional massaging such as lining up everything with the original field name list. For example if there is a source line "BatchId = NewId" it will match field name list item "BatchId". Is there an easy way to include in the output both "BatchId" and "BatchId = NewId"?

Played with the matches object but it doesn't seem to have the information. Also tried Pipeline variable like here but X is null.

$inputFile = 'C:\DataColumnsNames.txt'
$outputFile = 'C:\DataColumnsUsages.txt'
Get-ChildItem C:\ProjectFolder -Filter *.cs -Recurse -Force -ea SilentlyContinue |
  Select-String -Pattern (Get-Content $inputFile -PipelineVariable x) | 
    Select-Object $x, Path, LineNumber, line | 
      Export-csv $outputile

Thanks.


Solution

  • The Microsoft.PowerShell.Commands.MatchInfo instances that Select-String outputs have a Pattern property that reflects the specific pattern among the (potential) array of patterns passed to -Pattern that matched on a given line.

    The caveat is that if multiple patterns match, .Pattern only reports the pattern among those that matched that is listed first among them in the -Pattern argument.

    Here's a simple example, using an array of strings to simulate lines from files as input:

    'A fool and',
    'his barn',
    'are soon parted.',
    'foo and bar on the same line' | 
      Select-String -Pattern ('bar', 'foo') | 
        Select-Object  Line, LineNumber, Pattern
    

    The above yields:

    Line                         LineNumber Pattern
    ----                         ---------- -------
    A fool and                            1 foo
    his barn                              2 bar
    foo and bar on the same line          4 bar
    

    Note how 'bar' is listed as the Pattern value for the last line, even though 'foo' appeared first in the input line, because 'bar' comes before 'foo' in the pattern array.


    To reflect the actual pattern that appears first on the input line in a Pattern property, more work is needed:

    • Formulate your array of patterns as a single regex using alternation (|), wrapped as a whole in a capture group ((...)) - e.g., '(bar|foo)')

      • Note: The expression used below, '({0})' -f ('bar', 'foo' -join '|'), constructs this regex dynamically, from an array (the array literal 'bar', 'foo' here, but you can substitute any array variable or even (Get-Content $inputFile)); if you want to treat the input patterns as literals and they happen to contain regex metacharacters (such as .), you'll need to escape them with [regex]::Escape() first.
    • Use a calculated property to define a custom Pattern property that reports the capture group's value, which is the first among the values encountered on each input line:

    'A fool and',
    'his barn',
    'are soon parted.',
    'foo and bar on the same line' | 
      Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) | 
        Select-Object Line, LineNumber, 
                      @{ n='Pattern'; e={ $_.Matches[0].Groups[1].Value } }
    

    This yields (abbreviated to show only the last match):

    Line                         LineNumber Pattern
    ----                         ---------- -------
    ...
    
    foo and bar on the same line          4 foo
    

    Now, 'foo' is properly reported as the matching pattern.


    To report all patterns found on each line:

    • Switch -AllMatches is required to tell Select-String to find all matches on each line, represented in the .Matches collection of the MatchInfo output objects.

    • The .Matches collection must then be enumerated (via the .ForEach() collection method) to extract the capture-group value from each match.

    'A fool and',
    'his barn',
    'are soon parted.',
    'foo and bar on the same line' | 
      Select-String -AllMatches -Pattern ('({0})' -f ('bar', 'foo' -join '|')) | 
        Select-Object Line, LineNumber, 
                      @{ n='Pattern'; e={ $_.Matches.ForEach({ $_.Groups[1].Value }) } }
    

    This yields (abbreviated to show only the last match):

    Line                         LineNumber Pattern
    ----                         ---------- -------
    ...
    
    foo and bar on the same line          4 {foo, bar}
    

    Note how both 'foo' and 'bar' are now reported in Pattern, in the order encountered on the line.