Using Compare-Object assigned to a variable:
$diff = Compare-Object -referenceobject (Get-Content $hashlog1 -Encoding UTF8 | select -skip 1) -differenceobject (Get-Content $hashlog2 -Encoding utf8 | select -skip 1) |
Select @{N='Hash'; E={($_.InputObject -split ' ',2)[0] }}, @{N='File'; E={((($_.InputObject -split ' ',2)[1]) -split ' ' | select -skiplast 2) -join ' ' }}, @{N='SizeDate'; E={($_.InputObject -split ' ' | select -last 2) -join ' ' }}, SideIndicator |
Format-Table -AutoSize | Out-String -Width 4096
Here is a sample log file call it hashlog.txt
after a compare-object
:
Hash File SizeDate SideIndicator
---- ---- -------- -------------
D41D8CD98F00B204E9800998ECF8427E \added.txt 0 20230107_021401 =>
6B7CA4894B3CFCBA1ECA6B8BB9656FE8 \dirsize2.bat 714 20231010_111350 =>
804AB051DB174BF5FF53911647A094C2 \SSDTest ™\logfile.txt 122644 20220806_221741 =>
9266C28971E3624B28DAB39ADEE0694E \SSDTest ™\logfilemixed.txt 14627 20220807_115714 =>
4298F8C3383A93D121A1A91764492F93 \SSDTest ™\newfile.rtf 42098 20231010_111523 =>
1233EEF8C71A6AF8D23068CCDF1E639D \SSDTest ™\Samsung 850 EVO Small Files Only.txt 205671 20220805_224013 =>
BD30A4E10CA22E3E3C6BA7063ACEBF0D \SSDTest ™\SSDCopy.bat 269 20220731_172008 =>
AF32AC8BEFDF8D3B63DE5D7B834709FD \SSDTest ™\SSDCopysmall.ps1 1670 20220806_221654 =>
AA7A12226FDD40671C191FAF6AA57733 \SSDTest ™\SSDCopy_Write_Read.bat 462 20220805_023811 =>
3EDC4CE7B65FBC140B7D0F5604F3070D \SSDTest ™\SSDMixedWrite.ps1 2476 20220807_115705 =>
768B150D59F7EA576F375430883CC8DA \SSDTest ™\SSDRead.ps1 2044 20220805_021614 =>
B7D8DA3D9C1387EBCEC0565176385BB7 \SMR FINAL\SMR_PC_TEST_1_FILLRND.log 13014 20220823_134552 <=
D41D8CD98F00B204E9800998ECF8427E \added.txt 0 20230107_021409 <=
E17C51851C46801F9399087B61736E34 \dirsize2.bat 711 20230107_000101 <=
E7A73A93D06EB266E614FC5110BBBF28 \SMR FINAL\SMR_PC_TEST_1_FILLRND.log 13014 20220822_133356 =>
804AB051DB174BF5FF53911647A094C2 \SSDTest\logfile.txt 122644 20220806_221741 <=
9266C28971E3624B28DAB39ADEE0694E \SSDTest\logfilemixed.txt 14627 20220807_115714 <=
1233EEF8C71A6AF8D23068CCDF1E639D \SSDTest\Samsung 850 EVO Small Files Only.txt 205671 20220805_224013 <=
AF32AC8BEFDF8D3B63DE5D7B834709FD \SSDTest\SSDCopysmall.ps1 1670 20220806_221654 <=
AA7A12226FDD40671C191FAF6AA57733 \SSDTest\SSDCopy_Write_Read.bat 462 20220805_023811 <=
3EDC4CE7B65FBC140B7D0F5604F3070D \SSDTest\SSDMixedWrite.ps1 2476 20220807_115705 <=
768B150D59F7EA576F375430883CC8DA \SSDTest\SSDRead.ps1 2044 20220805_021614 <=
How can I go about outputting only entries that match two or more times based on File
attribute? So, for example, the above log would result in the following:
D41D8CD98F00B204E9800998ECF8427E \added.txt 0 20230107_021401 =>
D41D8CD98F00B204E9800998ECF8427E \added.txt 0 20230107_021409 <=
6B7CA4894B3CFCBA1ECA6B8BB9656FE8 \dirsize2.bat 714 20231010_111350 =>
E17C51851C46801F9399087B61736E34 \dirsize2.bat 711 20230107_000101 <=
B7D8DA3D9C1387EBCEC0565176385BB7 \SMR FINAL\SMR_PC_TEST_1_FILLRND.log 13014 20220823_134552 =>
E7A73A93D06EB266E614FC5110BBBF28 \SMR FINAL\SMR_PC_TEST_1_FILLRND.log 13014 20220822_133356 <=
I did attempt a nested ForEach-Object
but that just output single matched files and didn't list both.
It seems like this should be straight forward, but it isn't. Thanks in advance for any assistance.
(If you're wondering about the "Trademark" Symbol ™ I was just making sure special characters were passing through)
In order to achieve your goal you need to be operating on objects with distinct properties instead of on strings:
…| Format-Table -AutoSize | Out-String -Width 4096
Format-*
cmdlets emit output objects whose sole purpose is to provide formatting instructions to PowerShell's for-display output-formatting system. In short: only ever use Format-*
cmdlets to format data for display, never for subsequent programmatic processing.
See this answer for more information.
By additionally using Out-String
, you end up with a single, multiline string that represents the original data in a format for human, not programmatic consumption.
Remove the Format-Table
call from your pipeline - which results in array of objects - and then pipe $diff
(or the modified pipeline directly) to Group-Object
as follows:
$groups =
$diff | Group-Object File | Where-Object Count -ge 2
This groups all objects by shared .File
property values and uses Where-Object
to filter them so as to output only those groups with 2 or more elements.
Each group is an instance of type Microsoft.PowerShell.Commands.GroupInfo
, whose .Groups
property is a collection of all the elements in the group.
You can then use $groups
to further process the groups programmatically; read on for how to display the resulting groups in a human-friendly fashion.
To display the resulting groups and their elements group by group, Format-Table
is appropriate:
$diff | Group-Object File -OutVariable groups | Where-Object Count -ge 2 |
ForEach-Object Group | Format-Table -GroupBy File
Note:
ForEach-Object
Group
(parameter -MemberName
is implied) in essence undoes the grouping (it outputs the elements of each group one by one), only for the -GroupBy
argument of Format-Table
to group them again for display (there's no way around that, as Format-Table -GroupBy
itself lacks any filtering capability).
The common -OutVariable
parameter is used to store the group objects in self-chosen variable $groups
(note how the name is specified without the $
), so that the groups can be programmatically processed later.
The above results in for-display output that:
prints a header between groups showing the shared .File
property value for each group
prints a header for each group, namely the names of the properties being displayed and a separator line.
To tweak this format:
Add -HideTableHeaders
to omit the for-each-group headers.
If you also want to omit the between-groups headers, more work is needed:
$diff | Group-Object File -OutVariable groups | Where-Object Count -ge 2 |
ForEach-Object {
($_.Group | Format-Table -HideTableHeaders | Out-String).Trim()
'' # Output an empty line between groups.
}