Search code examples
pythonpython-3.xsubprocess

get text from stdout in subprocess.run without UnicodeDecodeError


I read the output from a windows command line call like so:

result = subprocess.run(["cmd", "/c", "dir","c:\mypath"], stdout=subprocess.PIPE, text=True,check=True)

The result may contain unexpected characters and I get a UnicodeDecodeError. It tried to sanitize it with text = result.stdout.encode('ascii','replace').decode('ascii') but this doesn't always help.

How do I robustly read the text avoiding any UnicodeDecodeError?


Solution

  • If you cannot rely on the subprocess to produce valid text, don't use text=True; but then the onus is on you to try to figure out the encoding when you need to decode the value.

    result = subprocess.run(
        ["cmd", "/c", "dir", r"c:\mypath"],
        capture_output=True, check=True)
    print(result.stdout.decode("cp1252"))  # or whatever encoding the system is using
    

    If you can predict the expected encoding, you can also say

    result = subprocess.run(
        ["cmd", "/c", "dir", r"c:\mypath"],
        capture_output=True, check=True, encoding="cp1252")
    

    By the looks of it, you are on Windows; probably examine your current system encoding (what's the output of chcp in a CMD window?) and adjust accordingly.

    (Notice also the use of a raw string for any string value with a literal backslash in it.)

    And of course, to merely get a directory listing, probably prefer os.scandir() or its pathlib equivalent.