Search code examples
powershellimport-csvforeach-object

Compare multiple elements in an object against multiple elements in another object of a different array


Say [hypothetically], I have two .CSVs I'm comparing to try and see which of my current members are original members... I wrote a nested ForEach-Object comparing every $name and $memberNumber from each object against every other object. It works fine, but is taking way to long, especially since each CSV has 10s of thousands of objects. Is there another way I should approach this?

Original_Members.csv

Name, Member_Number

Alice, 1234

Jim , 4567

Current_Members.csv

Alice, 4599

Jim, 4567

$currentMembers = import-csv $home\Desktop\current_members.csv |

ForEach-Object {
    $name = $_.Name      
    $memNum = $_."Member Number"

    $ogMembers = import-csv $home\Desktop\original_members.csv" |
        ForEach-Object {
            If ($ogMembers.Name -eq $name -and $ogMembers."Member Number" -eq $memNum) {
                $ogMember = "Yes"
            }  
            Else {
                $ogMember = "No"
            }
        }
            [pscustomobject]@{
            "Name"=$name
            "Member Number"=$memNum
            "Original Member?"=$ogMember
            }
} |

select "Name","Member Number","Original Member?" |
Export-CSV "$home\Desktop\OG_Compare_$(get-date -uformat "%d%b%Y").csv" -Append -NoTypeInformation

Solution

  • Assuming both of your files are like the below:

    Original_Members.csv

    Name, Member_Number
    Alice, 1234
    Jim, 4567
    

    Current_Members.csv

    Name, Member_Number
    Alice, 4599
    Jim, 4567
    

    You could store the original member names in a System.Collections.Generic.HashSet<T> for constant time lookups, instead of doing a linear search for each name. We can use System.Linq.Enumerable.ToHashSet to create a hashset of string[] names.

    We can then use Where-Object to filter current names by checking if the hashset contains the original name with System.Collections.Generic.HashSet<T>.Contains(T), which is an O(1) method.

    $originalMembers = Import-Csv -Path .\Original_Members.csv
    $currentMembers = Import-Csv -Path .\Current_Members.csv
    
    $originalMembersLookup = [Linq.Enumerable]::ToHashSet(
        [string[]]$originalMembers.Name, 
        [StringComparer]::CurrentCultureIgnoreCase
    )
    
    $currentMembers | 
        Where-Object {$originalMembersLookup.Contains($_.Name)}
    

    Which will output the current members that were original members:

    Name  Member_Number
    ----  -------------
    Alice 4599
    Jim   4567
    

    Update

    As requested in the comments, If we want to check both Name and Member_Number, we can concatenate both strings to use for lookups:

    $originalMembers = Import-Csv -Path .\Original_Members.csv
    $currentMembers = Import-Csv -Path .\Current_Members.csv
    
    $originalMembersLookup = [Linq.Enumerable]::ToHashSet(
        [string[]]($originalMembers | 
            ForEach-Object {
                $_.Name + $_.Member_Number
            }), 
        [StringComparer]::CurrentCultureIgnoreCase
    )
    
    $currentMembers | 
        Where-Object {$originalMembersLookup.Contains($_.Name + $_.Member_Number)}
    

    Which will now only return:

    Name Member_Number
    ---- -------------
    Jim  4567