Search code examples
powershellconvertto-json

Issue with PowerShell ConvertTo-Json adding un-printable characters to beginning of file


I am having a problem with a PowerShell ConvertTo-Json command. The resulting file has two non-printable characters as the first to characters of the file. Using Format-Hex, the return values are:

           00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F

00000000   FF FE 5B 00 0D 00 0A 00 20 00 20 00 20 00 20 00  .þ[..... . . . .
00000010   7B 00 0D 00 0A 00 20 00 20 00 20 00 20 00 20 00  {..... . . . . .
...

The bad characters are the FF and FE in the 00 and 01 positions. The command I am using to generate the file is the following:

$search.resources.attributes |select $Object.Attributes.ID  |Sort-Object -Property displayname | ConvertTo-Json | Out-File  ([Environment]::GetFolderPath("Desktop")+"\RL_Identities.json")

The $search is a result of an Invoke-RestMethod call.

I am importing this file into an Oracle CLOB column. This column has a CHECK (COLUMN_NAME IS JSON) check constraint and it fails with the two un-printable characters in the file. If I open the file in Notepad++, do a select all and copy/paste to a new file, the new file loads perfectly because it doesn't have the two characters at the beginning.

Is there any reason the two characters are there? Is it "feature" of the ConvertTo-Json command or could it be coming from the data in the Invoke-RestMethod call? If there is no way to prevent these characters from being there, is there a way to programmatically remove the first two bytes of the file?


Solution

  • Your file uses the "Unicode" (UTF-16LE) character encoding, which is what you get by default in Windows PowerShell when you use > / Out-File.

    • The first two bytes, FF and FE, make up the so-called BOM (byte-order mark), aka Unicode signature, which identifies the encoding.

    Instead, use Set-Content (or -Out-File) with the -Encoding parameter to specify the desired encoding.

    The caveat is that if you need UTF-8, -Encoding utf8 in Windows PowerShell creates UTF-8 files with a BOM, which not all consumers understand.

    • In PowerShell (Core) 7+, by contrast, you get BOM-less UTF-8 by default, across all cmdlets (and therefore also with >).

    If you're on Windows PowerShell and need to create a BOM-less UTF-8 file, you can use the following workaround via New-Item:

    # Creates out.txt with BOM-less UTF-8 encoding.
    # Note that the -Value argument must be a single, potentially multi-line
    # string and a trailing newline is NOT added.
    $null = New-Item -Force out.txt -Value (
              ConvertTo-Json 'hü'
            )
    

    The sample ConvertTo-Json call results in verbatim "hü". Passing the resulting file to Format-Hex shows that the file has no BOM and that ü (LATIN SMALL LETTER U WITH DIAERESIS, U+00FC) is correctly encoded as UTF-8 byte sequence 0xC3, 0xBC:

               Path: C:\Users\jdoe\out.txt
    
               00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    
    00000000   22 68 C3 BC 22                                   "hü"