Search code examples
node.jspowershellencoding

Bad Encoding when redirecting output of nodejs program to file (windows 10 powershell possible issue)


I have a simple javascript file (let's call it index.js) with the following:

console.log('pérola');

I use VSCode on windows 10 and it uses as terminal the powershell, when I execute the file using:

node index.js

I get the following output:

pérola

If I run the following:

node index.js > output.txt

I get the following on the file:

p├®rola

It seems there is some issue with the encoding of powershell when writing to files, when I open the file on VSCode I can see on the bottom right that the encoding is UTF-16 LE.

I also already tried the following:

node index.js | out-file -encoding utf8 output.txt

The file is saved in UTF8 with BOM but still with wrong encoding since what I see is p├®rola and not pérola

Can someone explain me what is wrong here? Thank you.


Solution

  • What node outputs is UTF-8-encoded.

    PowerShell's > operator does not pass the underlying bytes through to the output file. [Update: in PowerShell (Core) v7.4+ it now does.]

    Instead, PowerShell converts the bytes output by node into .NET strings based on the encoding stored in [Console]::OutputEncoding and then saves the resulting strings based on the encoding implied by the > operator, which is - effectively, not technically - an alias of the Out-File cmdlet.

    In other words: for PowerShell to properly interpret node's output you must (temporarily) set [Console]::OutputEncoding to [System.Text.Utf8Encoding]::new().

    Additionally, you must then decide what character encoding you want the output file to have, by using Out-File -Encoding or - preferably, if the input is text already - Set-Content -Encoding instead of >.
    That is, you need to do this unless > / Out-File's default character encoding works for you: it is "Unicode" (UTF16-LE) in Windows PowerShell, and BOM-less UTF-8 in PowerShell [Core] v6+.

    See also:

    • This answer for background information on how to make PowerShell console windows use UTF-8 consistently when communication with external programs[1], both when sending data to external programs ($OutputEncoding) and when interpreting data from external programs ([Console]::OutputEncoding):

      • In short, place the following statement in your $PROFILE:

        $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
        
      • If you're running in the - obsolescent - Windows PowerShell ISE, you need an additional command to ensure that the ISE first allocates a hidden console behind the scenes; note that in the recommended replacement, Visual Studio Code with its PowerShell extension, this is not necessary:

        $null = chcp # Run any console application to force the ISE to create a console.
        $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
        
    • This answer for a system-wide way to make non-Unicode (console) applications use UTF-8, available in recent versions of Windows 10. This makes both cmd.exe and PowerShell use UTF-8 by default.[1]

      • Caveat: This feature is still in beta a of Windows 11 22H2, and it has far-reaching consequences - see the linked answer.

    [1] What encoding PowerShell's own cmdlets use is not controlled by this; PowerShell cmdlets have their own defaults, which are - unfortunately - inconsistent in Windows PowerShell, whereas in PowerShell [Core] v6+ (BOM-less) UTF-8 is the consistent default; see this answer.