Search code examples
powershellgithubstring-comparison

How to compare file on local PC and github?


I have a file on my PC called test.ps1

I have a file hosted on my github called test.ps1

both of them have the same contents a string inside them

I am using the following script to try and comapare them:

$fileA = Get-Content -Path "C:\Users\User\Desktop\test.ps1"

$fileB = (Invoke-webrequest -URI "https://raw.githubusercontent.com/repo/Scripts/test.ps1")

if(Compare-Object -ReferenceObject $fileA -DifferenceObject ($fileB -split '\r?\n'))

 {"files are different"}

Else {"Files are the same"}

echo ""

Write-Host $fileA

echo ""

Write-Host $fileB

however my output is showing the exact same data for both but it says the files are different. The output:

files are different

a string

a string

is there some weird EOL thing going on or something?


Solution

  • tl;dr

    # Remove a trailing newline from the downloaded file content
    # before splitting into lines.
    # Parameter names omitted for brevity.
    Compare-Object  $fileA  ($fileB -replace '\r?\n\z' -split '\r?\n' )
    

    If the files are truly identical (save for any character-encoding and newline-format differences, and whether or not the local file has a trailing newline), you'll see no output (because Compare-Object only reports differences by default).


    If the lines look the same, it sounds like character encoding is not the problem, though it's worth pointing out that Get-Content in Windows PowerShell, in the absence of a BOM, assumes that a file is ANSI-encoded, so a UTF-8 file without BOM that contains characters outside the ASCII range will be misinterpreted - use -Encoding utf8 to fix that.

    Assuming that the files are truly identical (including not having variations in whitespace, such as trailing spaces at the end of lines), the likeliest explanation is that the file being retrieved has a trailing newline, as is typical for text files.

    Thus, if the downloaded file has a trailing newline, as is to be expected, if you apply -split '\r?\n' to the multi-line string representing the entire file content in order to split it into lines, you'll end up with an extra, empty array element at the end, which causes Compare-Object to report that element as a difference.

    Compare-Object emitting an object is evaluated as $true in the implied Boolean context of your if statement's conditional, which is why files are different is output.

    The above -replace operation, -replace '\r?\n\z' (\z matches the very end of a (multi-line) string), compensates for that, by removing the trailing newline before splitting into lines.