Search code examples
powershelltextutf-8character-encoding

PowerShell 5.1, Output to a text file with Out-File / Set-Content and utf8


I'm a bit stumped at how to interpret this in PS 5.1:

First, I try to create a file with Out-File and then append to it with Add-Content and Out-File -Append:

"# Script for running installers`n`n" | Out-File $InstallTools -Encoding utf8
"sleep 5" | Add-Content $InstallTools
"Start-Process -WindowStyle Hidden `"powershell.exe`" -ArgumentList `"-NoProfile -ExecutionPolicy Bypass -File `"`"$SetupScript`"`"`"" | 
  Out-File -Append $InstallTools

Output is a mess:

# Script for running installers
sleep 5
S t a r t - P r o c e s s   - W i n d o w S t y l e   H i d d e n   " p o w e r s h e l l . e x e "   - A r g u m e n t L i s t   " - N o P r o f i l e   - E x e c u t i o n P o l i c y   B y p a s s   - F i l e   " " C : \ U s e r s \ B o s s \ I n s t a l l   w i n f e t c h . p s 1 " " " 

The Add-Content call works fine, whereas the Out-File -Append call causes the file corruption.

The above output is when I open in notepad. If I open in VS Code, it's worse; I see no text after the sleep 5 just a load of nulls.

Why do the first 2 lines work fine, then the third is garbled, and how should I construct things to get meaningful output?

Edit: Believe I have resolved the problem, by adding -Encoding utf8 to the Out-File -Append lines. I read somewhere that the -Append and -Encoding switches should not be used together, but it seems they must be to get consistent output.


Solution

  • Believe I have resolved the problem, by adding -Encoding utf8 to the Out-File -Append lines.

    Indeed, that is the key to solving your problem, for the following reasons:

    • If you use Out-File -Append to append to a preexisting file, you must explicitly match the preexisting content's encoding with an -Encoding argument, namely -Encoding utf8 in your case.

      • In the absence of an -Encoding argument, Out-File uses its default encoding, which is "Unicode" (UTF-16LE) in Windows PowerShell (resulting in your symptom), and (BOM-less) UTF-8 in PowerShell (Core) 7+.

      • Note that > and >> are in effect aliases of Out-File and Out-File -Append, so the above applies to them too; you can only indirectly control their output encoding - which then affects Out-File calls too - via $PSDefaultParameterValues['Out-File:Encoding'] = 'utf8' - see this answer for details.

    • By contrast, Add-Content tries to match the existing encoding, in both PowerShell editions, but what encoding it assumes in the absence of a BOM in the preexisting content differs by edition:

      • Windows PowerShell: ANSI (the equivalent of -Encoding Default), same as Set-Content
      • PowerShell 7+: UTF-8

    The upshot with respect to your scenario is:

    • Since you're running Windows PowerShell and used -Encoding utf8 to initially create your file, that file has a BOM (there is no -Encoding argument that would allow creation of BOM-less UTF-8 files in Windows PowerShell - see this answer for background information and workarounds).

    • Thanks to the presence of a BOM, Add-Content was therefore able to unambiguously match the existing encoding, whereas Out-File -Append blindly used its Windows PowerShell default, UTF-16LE.

    • With text as input, pairing Set-Content (for the initial file creation) with Add-Content (for appending) has fewer surprises and performs better than Out-File, and the Add-Content calls don't need -Encoding arguments.

      • However, in Windows PowerShell you need to use -Encoding utf8 with Set-Content to ensure UTF-8 encoding (invariably with a BOM).
    • None of the above problems would arise in PowerShell (Core) 7+, which consistently defaults to (BOM-less) UTF-8, across all cmdlets; if you do want a BOM, use -Encoding utf8BOM