Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim , 4567
Current_Members.csv
Alice, 4599
Jim, 4567
$currentMembers = import-csv $home\Desktop\current_members.csv |
ForEach-Object {
$name = $_.Name
$memNum = $_."Member Number"
$ogMembers = import-csv $home\Desktop\original_members.csv" |
ForEach-Object {
If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
$ogMember = "Yes"
}
Else {
$ogMember = "No"
}
}
[pscustomobject]@{
"Name"=$name
"Member Number"=$memNum
"Original Member?"=$ogMember
}
} |
select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation
Assuming both of your files are like the below:
Original_Members.csv
Name, Member_Number
Alice, 1234
Jim, 4567
Current_Members.csv
Name, Member_Number
Alice, 4599
Jim, 4567
You could store the original member names in a System.Collections.Generic.HashSet<T>
for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet
to create a hashset of string[]
names.
We can then use Where-Object
to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T)
, which is an O(1) method.
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]$originalMembers.Name,
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name)}
Which will output the current members that were original members:
Name Member_Number
---- -------------
Alice 4599
Jim 4567
As requested in the comments, If we want to check both Name
and Member_Number
, we can concatenate both strings to use for lookups:
$originalMembers = Import-Csv -Path .\Original_Members.csv
$currentMembers = Import-Csv -Path .\Current_Members.csv
$originalMembersLookup = [Linq.Enumerable]::ToHashSet(
[string[]]($originalMembers |
ForEach-Object {
$_.Name + $_.Member_Number
}),
[StringComparer]::CurrentCultureIgnoreCase
)
$currentMembers |
Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
Which will now only return:
Name Member_Number
---- -------------
Jim 4567