Search code examples
windowspython-3.xsubprocesspopenstderr

Why's UnicodeDecodeError hidden until after adding time.sleep(1)?


EDITED2:

In the EDITED code below, f_out.write(bytearray(out or "")) should be replaced (both times) with:
f_out.write(bytearray((out or ""), 'utf8')) # BEFORE removing universal_newlines=True
OR
f_out.write(out or "")# AFTER removing universal_newlines=True


msw, tdelaney, and j-f-sebastian - Thank you so much for all your help!

EDITED - As a result, here's the edited version of my script which NOW CONSISTENTLY TRIGGERS UnicodeDecodeError:

#!python3  # Run this script with Python 3.x (in Windows, assuming pylauncher is installed).
import subprocess
import sys

sys.stderr = sys.stdout = open('std.outerr', 'w')
# Redirected stdout/stderr so that they can be seen even when script is not run from command line.
child = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"], bufsize=0,
                         stdin=subprocess.DEVNULL, stdout=subprocess.PIPE,
                         stderr=subprocess.STDOUT, universal_newlines=True)
# `universal_newlines=True` TEMPORARILY left in to show that UnicodeDecodeError is triggered.
# `universal_newlines=True` WILL be removed from FINAL script.
try:
    (out, _) = child.communicate(timeout=5)
# 1 second wasn't long enough for UnicodeDecodeError to consistently be triggered.
# Since subprocess's stderr was redirected to its stdout, 2nd element of tuple will be `None`.
except subprocess.TimeoutExpired:
    child.kill()
    (out, _) = child.communicate()  # Try a 2nd time, without timeout.
    with open('subprocess.out', 'wb') as f_out:
        f_out.write(bytearray(out or ""))  # Treat `None` as an empty string).
else:
    print("\nERROR: A crash occurred before the timeout expired!\n")
    with open('subprocess.out', 'wb') as f_out:
        f_out.write(bytearray(out or ""))

EDITED - And now (with the script above, minus the universal_newlines=True), the 1.2MB, 18,978 line stderr which Evince generated is correctly captured:

Error: PDF file is damaged - attempting to reconstruct xref table... Error: Kid object (page 1) is not an indirect reference (integer)

....................................................................
(Evince.exe:6800): GLib-GObject-CRITICAL **: g_object_unref: assertion `G_IS_OBJECT (object)' failed

For some fuzzing I'm doing, the subprocess.Popen() call below:

import subprocess
proc = subprocess.Popen([r"Evince\bin\Evince.exe", "fuzzed.pdf"],
                         stdout=subprocess.PIPE, stderr=subprocess.STDOUT,
                         universal_newlines=True)
try:
    proc.communicate(timeout=1)  # Works the same with timeout=60 seconds.
except subprocess.TimeoutExpired:  # This exception is new to Python 3.3.
    proc.kill()
    # Other code here.
else:
    print("\nERROR: A crash occurred before the timeout expired!\n")

gave me a UnicodeDecodeError:

Exception in thread Thread-1: Traceback (most recent call last):  
File "p:\python35-64\lib\threading.py", line 914, in _bootstrap_inner self.run()   
File "p:\python35-64\lib\threading.py", line 862, in run self._target(*self._args, **self._kwargs)   
File "p:\python35-64\lib\subprocess.py", line 1279, in _readerthread buffer.append(fh.read())
File "p:\python35-64\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 291289:  character maps to [undefined]

That occurred even when I simplified the "Other code" to something as simple as time.sleep(1). Yet when I removed the "Other code", no exception occured.

I realize now that the exception occurred because of my unnecessarily specifying universal_newlines=True on the Popen() call. [That's not compatible with a byte being written to stderr with a value greater than 127 (which was occcuring).]

However, because the exception only occurs when there's some "Other code" after my proc.kill(), it would seem that something else still is likely not quite right with my code. So, I temporarily left universal_newlines=True in my code, and omitted my "Other code", in order to be better able to determine what that is.

I tried changing buf_size and tried flush()ing both stdout and stderr, but none of that seems to make any difference.

I saw in the Python docs that:

Popen objects are supported as context managers via the with statement: on exit, standard file descriptors are closed, and the process is waited for.

so I tried replacing my Popen() call with:

with subprocess.Popen(..., universal_newlines=True) as proc:

and that generated the UnicodeDecodeError, even with no "Other code" present. So, that's 1 way to "fix" my code, but (because of some additional things I need to do), I'd ideally like to use the 3rd-party PyPIpsutil module. And, regrettably doesn't currently support context managers. So, if possible, I'd like to code this without with ... as.

What else (other than the value of universal_newlines) could I change in my code, to "fix" it?

Based on what the docs said about "'Popen' objects are supported as context managers", I tried adding:

if proc.stdout:
    proc.stdout.close()
if proc.stderr:
    proc.stderr.close()
if proc.stdin:
    proc.stdin.close()

and/or proc.wait() just before my proc.kill(), but then the proc.kill was never reached.

What's with ... asdoing that I should be doing?

Thanks in advance.


Solution

  • The output may be buffered and therefore the text can be decoded even after the child process is already dead. If there is no time.sleep(1) then the parent might exit before the decoding has encountered the error (I/O reader daemon threads started by .communicate() are killed then the parent process exits).