I have a powershell script that returns some strings via Write-Output. I would like those lines to be UTF8 with no bom. I do not want a global setting, I just want this to be effective for that particular few lines I write at that time.
This other question helped me get to a point: Using PowerShell to write a file in UTF-8 without the BOM
I took inspiration from one of the answers, and wrote the following code:
$mystr = "test 1 2 3"
$mybytes = [Text.Encoding]::UTF8.GetBytes($mystr)
$OutStream = [console]::OpenStandardOutput()
$OutStream.Write($mybytes,0,$TestBytes.Length)
$OutStream.Close()
However this code ONLY writes to stdout, and if I try to redirect it, it ignores my request. In other words, putting that code in test.ps1 and running test.ps1 >out.txt still prints to the console instead of to out.txt.
Could someone recommend how I could write this code so in case a user redirects the output of my PS to a file via >, that output is UTF8 with no BOM?
To add to Frode F.'s helpful answer:
What you were ultimately looking to achieve was to write a raw byte stream to PowerShell's success-output stream (the equivalent of stdout in traditional shells[0] ), not to the console.
The success output stream is what commands in PowerShell use to pass data to each other, including to output-redirection operator >
, at which point the console isn't involved.
(Data written to the success-output stream may end up displayed in the console, namely if the stream is neither captured in a variable nor redirected elsewhere.)
However, it is not possible to send raw byte streams to PowerShell's success output stream; only objects (instances of .NET types) can be sent, because PowerShell is fundamentally object-oriented.
Even data representing a stream of bytes must be sent as a .NET object, such as a [byte[]]
array.
[byte[]]
array directly to a file with >
, does not write the array's raw bytes, because >
creates a "Unicode" (UTF-16LE-encoded[1])
text representation of the array (as you would see if you printed the array to the console).In order to encode objects as byte streams (that are often encoded text) for external sinks such as a file, you need the help of PowerShell cmdlets (e.g., Set-Content
), >
(the output redirection operator), or the methods of appropriate .NET types (e.g., [System.IO.File]
), except in 2 special cases:
$OutputEncoding
is implicitly used.[Console]::OutputEncoding
is implicitly used; also, output from external programs is assumed to be encoded that way[2]
.Generally, when it comes to text output, it is simpler to use the -Encoding
parameter of output cmdlets such as Set-Content
to let that cmdlet perform the encoding rather than trying to obtain a byte representation in a separate first step.
However, a BOM-less UTF-8 encoding cannot be selected this way in Windows PowerShell (it can in PowerShell Core), so using an explicit byte representation is an option, in combination with Set-Content -Encoding Byte
[3]
; e.g.:
# Write string "hü" to a UTF-8-encoded file *without BOM*:
[Text.Encoding]::UTF8.GetBytes('hü') |
Set-Content -Encoding Byte file.txt
[0] Writing to stdout from within PowerShell, as you attempted, bypasses PowerShell's own system of output streams and prints directly to the console. (As an aside: Console.OpenStandardOutput() is designed to bypass redirections even in the context of traditional shells.)
[1] Up to PowerShell v5.0, you couldn't change the encoding used by >
; in PSv5.1 and above, you can use something like $PSDefaultParameterValues['Out-File:Encoding']='UTF8'
- that would still include a BOM, however. For background, see this answer of mine.
[2] There is a noteworthy asymmetry: on sending text to external programs, $OutputEncoding
defaults to ASCII (7-bit only) encoding, which means that any non-ASCII characters get transliterated to literal ?
chars.; by contrast, on interpreting text from external programs, the applicable [Console]::OutputEncoding
defaults to the system's active legacy OEM code page, which is an 8-bit encoding. See the list of code pages supported by Windows.
[3] Of course, passing bytes through is not really an encoding; perhaps for that reason -Encoding Byte
was removed from PowerShell Core, where -AsByteStream
must be used instead.