Search code examples
powershellcompareobject

Powershell Compare-Object Output Separate Files for each SideIndicator


(This is probably something rather simple I'm missing; but I can't seem to figure it out and haven't found any answers in search)

I need to compare two CSV files with the same columns and output the row differences as follows (final output in Unicode Text):

  • If row exists in FileA but not FileB, label that row "Good"
  • If row exists in FileB but not FileA, label that row "Bad"

Let's say I have the following sample data:

File A:
Column1,Column2,Column3
Tommy,4133,20180204
Suzie,5200,20210112
Tammy,221,20201010

File B:
Column1,Column2,Column3
Tommy,4133,20180204
Nicky,5200,20190520

Here is my current code (borrowing the hash-enabled Compare-Object2 from this site because the delivered Compare-Object is too slow -- FYI, I'm using Get-Content instead of Import-Csv because it's a good 50-times faster since we're comparing entire row. And the MyHeader variable is just to retain the original file's header column values)

Compare-Object2 (Get-Content $FileA) (Get-Content $FileB) -PassThru |
Select-Object @{l=[string]$MyHeader;e={$_.InputObject}},
              @{n='Row Label'; e={ @{'=>' = 'Bad' ; '<=' = 'Good'}[$_.SideIndicator]}},
              @{n='Placeholder'; e={@{'*'='0'}['*']}} |
Sort-Object 'Row Label' -Descending | Export-Csv "$FinalCSV" -NoType;

#Removing " char to create CSV with original and added columns together
Set-Content "$FinalCSV" ((Get-Content "$FinalCSV") -replace '"');

#Convert csv to tab delimited
Import-Csv "$FinalCSV" | Export-Csv "$FinalTXT"  -NoTypeInformation -Delimiter "`t";

#Remove " char and convert to unicode
Set-Content -Encoding UNICODE "$FinalTXT" ((Get-Content "$FinalTXT") -replace '"')

This works perfectly (I know some of it is redundant at the end; but hey: it's the best I could do -- but definitely feel free to fix those parts too!) to create a single output file of the Good and the Bad -- about 40 seconds for two files with 400K rows.

Result File:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0
Nicky   5200    20210112    Bad 0

The problem is, I now need to create them as separate files: one file for the good, one for the bad. So the new needed output would be:

ResultFileGood:
Column1 Column2 Column3 Row Label   Placeholder
Suzie   5200    20210112    Good    0
Tammy   221 20201010    Good    0

ResultFileBad:
Column1 Column2 Column3 Row Label   Placeholder
Nicky   5200    20210112    Bad 0

And I just know there has to be a way to do it without having to run the compare twice - some use of the Where-Object prop or a loop of some sort. I just can't figure it out; so I'm coming to the experts.

Thanks

EDIT: Thanks to postanote, one viable alternative is to just output the combined file and then just split that, which is definitely faster than running the entire compare routine twice. Still would like to see if there's a way to do it directly in the comparison export without the intermediate file; but that's definitely a viable option and what I'm using for now

$FinalHeader = get-content "$FinalTXT" | Select -First 1
$BadOutput = Select-String -Path $FinalTXT -Pattern ('Bad   0')
$GoodOutput = Select-String -Path $FinalTXT -Pattern ('Good 0')
@($FinalHeader,$BadOutput.Line) | Out-File "$FinalBadTXT" -Encoding UNICODE;
@($FinalHeader,$GoodOutput.Line) | Out-File "$FinalGoodTXT" -Encoding UNICODE;

Solution

  • Continuing from my comment.

    you've got a lot going on there; i.e., some proxy function, etc.

    Mixing these items like you are, you end up with stuff like this... (very simplified of course, and since you are to showing your input you are forcing us to guess to come up with one.)

    psEdit -filenames 'D:\temp\book1.txt'
    # Results
    <#
    Site,Dept,Office,Floor
    Main,aaa,bbb,ccc
    Main0,aaa,bbb,ccc
    Branch1,ddd,eee,fff
    Branch2,ggg,hhh,iii
    #>
    
    psEdit -filenames 'D:\temp\book3.txt'
    # Results
    <#
    Site,Dept,Office,Floor
    Main,aaa,bbb,ccc
    Branch1,ddd,eee,fff
    Branch2,ggg,hhh,iii
    Branch3,jjj,kkk,lll
    Branch4,mmm,nnn,ooo
    #>
    

    Update:

    Removing all the previous stuff, since they were not your cup of tea...

    ;-}

    Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt') | 
    Export-Csv -Path 'D:\Temp\CompareObject.csv' -NoTypeInformation -Force
    
    (Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\<=') -replace '.*CompareObject.*:\"|\"\,.*' | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
    Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
    
    (Select-String -Path 'D:\Temp\CompareObject.csv' -Pattern '\=>') -replace '.*CompareObject.*:\"|\"\,.*' | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
    Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
    
    $FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
    
    $FileList | 
    ForEach-Object {
        "`n********* Getting content $PSItem *********`n"
        Import-Csv -Path  "D:\temp\$PSItem"
    }
    # Results
    <#
    ********* Getting content ReferenceObject.csv *********
    
    
    Site    Dept Office Floor
    ----    ---- ------ -----
    Main0   aaa  bbb    ccc  
    
    ********* Getting content DifferenceObject.csv *********
    
    Branch3 jjj  kkk    lll  
    Branch4 mmm  nnn    ooo 
    #>
    

    So, as for your last comment:


    While that method still uses the intermediary file; I admit I completely wasn't thinking about the simple approach of just exporting the combined file and then just splitting that.***

    Ok, then, without using the 'intermediary file'.

    ($ComparedObjects = Compare-Object2 -ReferenceObject (Get-Content -Path 'D:\temp\book1.txt') -DifferenceObject (Get-Content -Path 'D:\temp\book3.txt'))
    # Results
    <#
    InputObject         SideIndicator
    -----------         -------------
    Main0,aaa,bbb,ccc   <=           
    Branch3,jjj,kkk,lll =>           
    Branch4,mmm,nnn,ooo => 
    #>
    
    ($ComparedObjects -match '<=').InputObject | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor 
    # Results
    <#
    Site  Dept Office Floor
    ----  ---- ------ -----
    Main0 aaa  bbb    ccc  
    #>
    
    ($ComparedObjects -match '=>').InputObject | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor 
    # Results
    <#
    Site    Dept Office Floor
    ----    ---- ------ -----
    Branch3 jjj  kkk    lll  
    Branch4 mmm  nnn    ooo 
    #>
    

    Then a just export to csv.

    ($ComparedObjects -match '<=').InputObject | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
    Export-Csv -Path 'D:\temp\ReferenceObject.csv' -NoTypeInformation -Force
    
    ($ComparedObjects -match '=>').InputObject | 
    ConvertFrom-Csv -Header Site, Dept, Office, Floor | 
    Export-Csv -Path 'D:\temp\DifferenceObject.csv' -NoTypeInformation -Force
    

    Read back as needed

    $FileList = 'ReferenceObject.csv', 'DifferenceObject.csv'
    
    $FileList | 
    ForEach-Object {
        "`n********* Getting content $PSItem *********`n"
        Import-Csv -Path  "D:\temp\$PSItem"
    }
    # Results
    <#
    ********* Getting content ReferenceObject.csv *********
    
    
    Site    Dept Office Floor
    ----    ---- ------ -----
    Main0   aaa  bbb    ccc  
    
    ********* Getting content DifferenceObject.csv *********
    
    Branch3 jjj  kkk    lll  
    Branch4 mmm  nnn    ooo  
    #>
    

    Update

    Per your comment --


    'the problem is the final output need: the Unicode Tab-delimited text with the additional columns.'


    (($ComparedObjects -match '<=').InputObject) -replace ',', "`t" | 
    ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
    Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
    Import-Csv -Path 'D:\temp\ReferenceObject.csv'
    # Results
    <#
    Site  Dept Office Floor
    ----  ---- ------ -----
    Main0 aaa  bbb    ccc  
    #>
    
    
    (($ComparedObjects -match '=>').InputObject) -replace ',', "`t" | 
    ConvertFrom-Csv -Delimiter "`t" -Header Site, Dept, Office, Floor  | 
    Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
    Import-Csv -Path 'D:\temp\DifferenceObject.csv'
    # Results
    <#
    Site    Dept Office Floor
    ----    ---- ------ -----
    Branch3 jjj  kkk    lll  
    Branch4 mmm  nnn    ooo  
    #>
    

    Or for the extra column stuff, you could do this...

    $ComparedObjects -match '<=' | 
    Select-Object -Property @{
        Name       = 'Site'
        Expression = {($PSItem.InputObject -split ',')[0]}
    },
    @{
        Name       = 'Dept'
        Expression = {($PSItem.InputObject -split ',')[1]}
    },
    @{
        Name       = 'Office'
        Expression = {($PSItem.InputObject -split ',')[2]}
    },
    @{
        Name       = 'Floor'
        Expression = {($PSItem.InputObject -split ',')[3]}
    },
    @{
        Name       = 'Label'
        Expression = {'Good'}
    }, 
    @{
        Name       = 'Placeholder'
        Expression = {0}
    } |  
    Export-Csv -Path 'D:\temp\ReferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
    (Get-Content -Path 'D:\temp\ReferenceObject.csv') -replace '"','' -replace ',', "`t" | 
    Set-Content -PassThru 'D:\temp\ReferenceObject.csv'
    Import-Csv -Path 'D:\temp\ReferenceObject.csv' -Delimiter "`t" | 
    Format-Table -AutoSize
    # Results
    <#
    Site  Dept Office Floor Label Placeholder
    ----  ---- ------ ----- ----- -----------
    Main0 aaa  bbb    ccc   Good  0 
    #>
    
    
    $ComparedObjects -match '=>' | 
    Select-Object -Property @{
        Name       = 'Site'
        Expression = {($PSItem.InputObject -split ',')[0]}
    },
    @{
        Name       = 'Dept'
        Expression = {($PSItem.InputObject -split ',')[1]}
    },
    @{
        Name       = 'Office'
        Expression = {($PSItem.InputObject -split ',')[2]}
    },
    @{
        Name       = 'Floor'
        Expression = {($PSItem.InputObject -split ',')[3]}
    },
    @{
        Name       = 'Label'
        Expression = {'Good'}
    }, 
    @{
        Name       = 'Placeholder'
        Expression = {0}
    } | 
    Export-Csv -Path 'D:\temp\DifferenceObject.csv' -Encoding Unicode -NoTypeInformation -Force
    (Get-Content -Path 'D:\temp\DifferenceObject.csv') -replace '"','' -replace ',', "`t" | 
    Set-Content -PassThru 'D:\temp\DifferenceObject.csv'
    Import-Csv -Path 'D:\temp\DifferenceObject.csv' -Delimiter "`t" | 
    Format-Table -AutoSize
    # Results
    <#
    Site    Dept Office Floor Label Placeholder
    ----    ---- ------ ----- ----- -----------
    Branch3 jjj  kkk    lll   Good  0          
    Branch4 mmm  nnn    ooo   Good  0 
    #>