I need to remove strings from one csv (file-a) that match or partially match the strings in another csv (file-b), based on the email address:
file-a
email,Firstname,Lastname
Peter@hotmail.com,pete,Smith
Paul@gmail.com,paul,
Mary@hotmail.com,,Jones
puff@yahoo.com,puff,Dragon
file-b
email,Firstname,Lastname
Peter@hotmail.com,,Smith
Mary@hotmail.com,Mary
deduped-output-file
email,Firstname,Lastname
Paul@gmail.com,paul,
puff@yahoo.com,puff,Dragon
I came across a similar question here:
Removing similar lines from two files
However, this only works for exact matches, I tried using "notmatch" instead of "notcontains" but this did not work. I'm quite new to powershell and I can't quite figure out what I need to do. Any help would be greatly appreciated.
I'd first Import-Csv
the files and use Compare-Object
restricted to the property email
## Q:\Test\2019\02\28\SO_54929339.ps1
$fileA = Import-csv '.\file-a.csv'
$fileB = Import-csv '.\file-b.csv'
$deduped = Compare-Object -Ref $fileA -Diff $fileB -Property email -PassThru |
Where-Object Sideindicator -eq '<=' |
Select-Object * -ExcludeProperty Sideindicator
$deduped
$deduped | Export-Csv '.\deduped-output-file.csv' -NoTypeInformation
Sample output:
> Q:\Test\2019\02\28\SO_54929339.ps1
email Firstname Lastname
----- --------- ---------
Paul@gmail.com paul
puff@yahoo.com puff Dragon