Search code examples
powershellcsvstring-comparison

Removing similar strings from csv using another csv


I need to remove strings from one csv (file-a) that match or partially match the strings in another csv (file-b), based on the email address:

file-a

email,Firstname,Lastname 
Peter@hotmail.com,pete,Smith
Paul@gmail.com,paul,
Mary@hotmail.com,,Jones
puff@yahoo.com,puff,Dragon

file-b

email,Firstname,Lastname
Peter@hotmail.com,,Smith
Mary@hotmail.com,Mary

deduped-output-file

email,Firstname,Lastname 
Paul@gmail.com,paul,
puff@yahoo.com,puff,Dragon

I came across a similar question here:

Removing similar lines from two files

However, this only works for exact matches, I tried using "notmatch" instead of "notcontains" but this did not work. I'm quite new to powershell and I can't quite figure out what I need to do. Any help would be greatly appreciated.


Solution

  • I'd first Import-Csv the files and use Compare-Object restricted to the property email

    ## Q:\Test\2019\02\28\SO_54929339.ps1
    
    $fileA = Import-csv '.\file-a.csv'
    $fileB = Import-csv '.\file-b.csv'
    
    $deduped = Compare-Object -Ref $fileA -Diff $fileB -Property email -PassThru | 
      Where-Object Sideindicator -eq '<=' | 
        Select-Object * -ExcludeProperty Sideindicator
    
    $deduped 
    $deduped | Export-Csv '.\deduped-output-file.csv' -NoTypeInformation
    

    Sample output:

    > Q:\Test\2019\02\28\SO_54929339.ps1
    
    email          Firstname Lastname
    -----          --------- ---------
    Paul@gmail.com paul
    puff@yahoo.com puff      Dragon