I read the output from a windows command line call like so:
result = subprocess.run(["cmd", "/c", "dir","c:\mypath"], stdout=subprocess.PIPE, text=True,check=True)
The result may contain unexpected characters and I get a UnicodeDecodeError. It tried to sanitize it with text = result.stdout.encode('ascii','replace').decode('ascii')
but this doesn't always help.
How do I robustly read the text avoiding any UnicodeDecodeError?
If you cannot rely on the subprocess to produce valid text, don't use text=True
; but then the onus is on you to try to figure out the encoding when you need to decode the value.
result = subprocess.run(
["cmd", "/c", "dir", r"c:\mypath"],
capture_output=True, check=True)
print(result.stdout.decode("cp1252")) # or whatever encoding the system is using
If you can predict the expected encoding, you can also say
result = subprocess.run(
["cmd", "/c", "dir", r"c:\mypath"],
capture_output=True, check=True, encoding="cp1252")
By the looks of it, you are on Windows; probably examine your current system encoding (what's the output of chcp
in a CMD window?) and adjust accordingly.
(Notice also the use of a raw string for any string value with a literal backslash in it.)
And of course, to merely get a directory listing, probably prefer os.scandir()
or its pathlib
equivalent.