Search code examples
powershellpowershell-3.0

How do I write UTF8 with no BOM to console (no file)?


I have a powershell script that returns some strings via Write-Output. I would like those lines to be UTF8 with no bom. I do not want a global setting, I just want this to be effective for that particular few lines I write at that time.

This other question helped me get to a point: Using PowerShell to write a file in UTF-8 without the BOM

I took inspiration from one of the answers, and wrote the following code:

$mystr = "test 1 2 3"
$mybytes = [Text.Encoding]::UTF8.GetBytes($mystr)
$OutStream = [console]::OpenStandardOutput()
$OutStream.Write($mybytes,0,$TestBytes.Length)
$OutStream.Close()

However this code ONLY writes to stdout, and if I try to redirect it, it ignores my request. In other words, putting that code in test.ps1 and running test.ps1 >out.txt still prints to the console instead of to out.txt.

Could someone recommend how I could write this code so in case a user redirects the output of my PS to a file via >, that output is UTF8 with no BOM?


Solution

  • To add to Frode F.'s helpful answer:

    • What you were ultimately looking to achieve was to write a raw byte stream to PowerShell's success-output stream (the equivalent of stdout in traditional shells[0] ), not to the console.

      • The success output stream is what commands in PowerShell use to pass data to each other, including to output-redirection operator >, at which point the console isn't involved.

      • (Data written to the success-output stream may end up displayed in the console, namely if the stream is neither captured in a variable nor redirected elsewhere.)

    • However, it is not possible to send raw byte streams to PowerShell's success output stream; only objects (instances of .NET types) can be sent, because PowerShell is fundamentally object-oriented.

      • Even data representing a stream of bytes must be sent as a .NET object, such as a [byte[]] array.

        • However, redirecting a [byte[]] array directly to a file with >, does not write the array's raw bytes, because > creates a "Unicode" (UTF-16LE-encoded[1]) text representation of the array (as you would see if you printed the array to the console).
      • In order to encode objects as byte streams (that are often encoded text) for external sinks such as a file, you need the help of PowerShell cmdlets (e.g., Set-Content), > (the output redirection operator), or the methods of appropriate .NET types (e.g., [System.IO.File]), except in 2 special cases:

        • When piping to an external program, the encoding stored in preference variable $OutputEncoding is implicitly used.
        • When printing to the console, the encoding stored in [Console]::OutputEncoding is implicitly used; also, output from external programs is assumed to be encoded that way[2] .
      • Generally, when it comes to text output, it is simpler to use the -Encoding parameter of output cmdlets such as Set-Content to let that cmdlet perform the encoding rather than trying to obtain a byte representation in a separate first step.

        • However, a BOM-less UTF-8 encoding cannot be selected this way in Windows PowerShell (it can in PowerShell Core), so using an explicit byte representation is an option, in combination with Set-Content -Encoding Byte[3] ; e.g.:

          # Write string "hü" to a UTF-8-encoded file *without BOM*:
          [Text.Encoding]::UTF8.GetBytes('hü') | 
            Set-Content -Encoding Byte file.txt
          

    [0] Writing to stdout from within PowerShell, as you attempted, bypasses PowerShell's own system of output streams and prints directly to the console. (As an aside: Console.OpenStandardOutput() is designed to bypass redirections even in the context of traditional shells.)

    [1] Up to PowerShell v5.0, you couldn't change the encoding used by >; in PSv5.1 and above, you can use something like $PSDefaultParameterValues['Out-File:Encoding']='UTF8' - that would still include a BOM, however. For background, see this answer of mine.

    [2] There is a noteworthy asymmetry: on sending text to external programs, $OutputEncoding defaults to ASCII (7-bit only) encoding, which means that any non-ASCII characters get transliterated to literal ? chars.; by contrast, on interpreting text from external programs, the applicable [Console]::OutputEncoding defaults to the system's active legacy OEM code page, which is an 8-bit encoding. See the list of code pages supported by Windows.

    [3] Of course, passing bytes through is not really an encoding; perhaps for that reason -Encoding Byte was removed from PowerShell Core, where -AsByteStream must be used instead.