Search code examples
pythonpowershellutf-8decode

Python - Get command output cannot be decoded


I'm currently working on a project where I need to run a command in powershell, and part of the output is not in English (Specifically - Hebrew).

For example (a simplified version of the problem), if I want to get the content of my desktop, and there is a filename in Hebrew:

import subprocess
command = "powershell.exe ls ~/Desktop"
print (subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode())

This code will raise the following error (Or something similar with a different byte value):

UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

Tried to run it on a different computer, and this was the output:

?????

Any idea why is that and how can I fix it? Tried a lot of things I saw on other questions, but none of them worked for me.


Solution

  • Note: The following are Python 3+ solutions, but there is a caveat:

    • With the first solution below and also with the second one - but only if UTF-8 data must be sent to PowerShell's stdin stream - due to a bug in powershell.exe, the Windows PowerShell CLI, the current console window switches to a raster font (potentially with a different font size), which does not support most non-extended-ASCII-range Unicode characters. While visually jarring, this is merely a display (rendering) problem; the data is handled correctly; switching back to a Unicode-aware font such as Consolas reveals the correct output.

    • By contrast, pwsh.exe, the PowerShell (Core) (v6+) CLI does not exhibit this problem.


    Option A: Configure both the console and Python to use UTF-8 character encoding before executing your script:

    • Configure the console to use UTF-8:

      • From cmd.exe, by switching the active OEM code page to 65001 (UTF-8); note that this change potentially affects all later calls to console applications in the session, independently of Python, unless you restore the original code page (see Option B below):

        chcp 65001
        
      • From PowerShell:

        $OutputEncoding = [Console]::InputEncoding = [Console]::OutputEncoding = [System.Text.UTF8Encoding]::new()
        
    • And configure Python (v3+) to use UTF-8 consistently:[1]

      • Set environment variable PYTHONUTF8 to 1, possibly persistently, via the registry; to do it ad hoc:

        • From cmd.exe:

          Set PYTHONUTF8=1
          
        • From PowerShell:

          $env:PYTHONUTF8=1
          
      • Alternatively, for an individual call (v3.7+): Pass command-line option -X utf8 to the python interpreter (note: case matters):

          python -X utf8 somefile.py ...
        
      • Both options enable Python UTF-8 Mode, which will become the default in Python 3.15.

    Now, your original code should work as-is (except for the display bug).

    Note:

    • A simpler alternative via a one-time configuration step is to configure your system to use UTF-8 system-wide, in which case both the OEM and the ANSI code pages are set to 65001. However, this has far-reaching consequences - see this answer.

    Option B: (Temporarily) switch to UTF-8 for the PowerShell call:

    import sys, ctypes, subprocess
    
    # Switch Python's own encoding to UTF-8, if necessary
    # This is the in-script equivalent of setting environment var. 
    # PYTHONUTF8 to 1 *before* calling the script.
    sys.stdin.reconfigure(encoding='utf-8'); sys.stdout.reconfigure(encoding='utf-8'); sys.stderr.reconfigure(encoding='utf-8')
    
    # Save the current console output code page and switch to 65001 (UTF-8)
    previousCp = windll.kernel32.GetConsoleOutputCP()
    windll.kernel32.SetConsoleOutputCP(65001)
    
    # PowerShell now emits UTF-8-encoded output; decode it as such.
    command = "powershell.exe ls ~/Desktop"
    print(subprocess.run(command, stdout=subprocess.PIPE).stdout.decode())
    
    # Restore the previous output console code page.
    windll.kernel32.SetConsoleOutputCP(previousCp)
    

    Note:

    • Due to setting only the output console page, the Windows PowerShell display bug is avoided.
    • If you also wanted to send input to PowerShell's stdin stream, you'd have to set the input console page too, via windll.kernel32.SetConsoleCP(65001) (which would then again surface the display bug).

    [1] This isn't strictly necessary just for correctly decoding PowerShell's output, but matters if you want to pass that output on from Python: Python 3.x defaults to the active ANSI(!) code page for encoding non-console output, which means that Hebrew characters, for instance, cannot be represented in non-console output (e.g., when redirecting to a file), and cause the script to break.