I have a CSV (in $inputCsv
) with three columns from which I would like to remove duplicates from SubName while keeping the minimum Rank category (minimum = TEST1 and maximum = TEST3)
Name SubName Rank
----- -------- -----
NAME1 SUBNAME1 TEST1
NAME2 SUBNAME2 TEST1
NAME2 SUBNAME2 TEST3
NAME3 SUBNAME3 TEST2
NAME4 SUBNAME4 TEST3
NAME4 SUBNAME4 TEST2
Desired result:
Name SubName Rank
----- -------- -----
NAME1 SUBNAME1 TEST1
NAME2 SUBNAME2 TEST1
NAME3 SUBNAME3 TEST2
NAME4 SUBNAME4 TEST2
Of course TEST3 can be kept in case it is the only value available.
For ease, I transform Rank
into a numeral column so that I can order it. I tried several Sort-Object
assemblies without success with, for example:
$inputCsv | Select-Object "Name", "SubName", "Rank", @{Name = 'Weight'; Expression = {
if ( $_."Rank" -eq "TEST1" ) { 1 }
elseif ( $_."Rank" -eq "TEST2" ) { 2 }
elseif ( $_."Rank" -eq "TEST3" ) { 3 } }
} | Sort-Object "SubName",{$_.Weight -lt 3} -Unique
Use Group-Object
first grouping by the SubName
property then for each group use Sort-Object
sorting by Rank
(the expression $_.Rank -replace '\D' -as [int]
is removing any non-numeric digit and then converting the resulting string into an integer for proper sorting) lastly use Select-Object
to take only 1 object and skip the rest:
$inputCsv | Group-Object SubName | ForEach-Object {
$_.Group |
Sort-Object { $_.Rank -replace '\D' -as [int] } |
Select-Object -First 1
}
NOTE: If the values in the Rank
property will always be a single digit you could just use Sort-Object Rank
.