python subprocess python-2.x python-unicode

Are the subprocess/commands modules Unicode aware with Python2?

Is the stdout/stderr captured from the various functions in the subprocess or the commands module for Python2 guaranteed to be a standard string, or is it possible under certain conditions that a Unicode object is returned? ... And if a standard Python2 string is returned, what happens if the subprocess outputs Unicode?

Or getting to the point more directly, what's the best way to robustly handle a Python2 subprocess call which may output Unicode characters?

And would it be substantially different if it was Python3?

Solution

A subprocess output will always^† be bytes in both Python 2 (called "str" here) and Python 3 (called "bytes" here). It is not possible for a subprocess to output unicode, because a "unicode object" is a concept that is internal to Python. The output is always bytes.

If the bytes are a representation of textual data, then you have to know what encoding is used by the subprocess before you can decode the output. Different subprocesses could output different encodings, so there is no one correct answer here.

^†_{There is one weird edge case to be aware of here. If you launch the subprocess using the kwarg universal_newlines=True, then the output will be automatically decoded using the encoding returned by locale.getpreferredencoding function.}