What is the difference between using universal_newlines=True (with bufsize=1) and using default arguments with Popen

I am trying to read the output of a subprocess called from Python. To do this I am using Popen (because I do not think it is possible to pipe stdout if using subprocess.call).

As of now I have two ways of doing it which, in testing, seem to provide the same results. The code is as follows:

with Popen(['Robocopy', source, destination, '/E', '/TEE', '/R:3', '/W:5', '/log+:log.txt'], stdout=PIPE) as Robocopy:
    for line in Robocopy.stdout:
        line = line.decode('ascii')
        message_list = [item.strip(' \t\n').replace('\r', '') for item in line.split('\t') if item != '']
        print(message_list[0], message_list[0])
    Robocopy.wait()
    returncode = Robocopy.returncode

and

with Popen(['Robocopy', source, destination, '/E', '/TEE', '/R:3', '/W:5', '/log+:log.txt'], stdout=PIPE, universal_newlines=True, bufsize=1) as Robocopy:
    for line in Robocopy.stdout:
        message_list = [item.strip() for item in line.split('\t') if item != '']
        print(message_list[0], message_list[2])
    Robocopy.wait()
    returncode = Robocopy.returncode

The first method does not include universal_newlines=True, as the documentation states this is only usable if universal_newlines=True i.e., in a text mode.

The second version does include universal_newlines and therefore I specify a bufsize.

Can somebody explain the difference to me? I can't find the article but I did read about issues with an overflowing buffer causing some sort of issue and thus the importance of using for line in stdout.

Additionally, when looking at the output, not specifying universal_newlines makes stdout a bytes object - but I am not sure what difference that makes if I just decode the bytes object with ascii (in terms of new lines and tabs) compared universal_newlines mode.

Lastly, setting the bufsize to 1 makes the output "line-buffered" but I am not sure what that means. I would appreciate an explanation about how these various elements tie together. Thanks

Solution

What is the difference between using universal_newlines=True (with bufsize=1) and using default arguments with Popen

The default values are: universal_newlines=False (meaning input/output is accepted as bytes, not Unicode strings plus the universal newlines mode handling (hence the name of the parameter. Python 3.7 provides text alias that might be more intuitive here) is disabled -- you get binary data as is (unless POSIX layer on Windows messes it up) and bufsize=-1 (meaning the streams are fully buffered -- the default buffer size is used).

universal_newlines=True uses locale.getpreferredencoding(False) character encoding to decode bytes (that may be different from ascii encoding used in your code).

If universal_newlines=False then for line in Robocopy.stdout: iterates over b'\n'-separated lines. If the process uses non-ascii encoding e.g., UTF-16 for its output then even if os.linesep == '\n' on your system; you may get a wrong result. If you want to consume text lines, use the text mode: pass universal_newlines=True or use io.TextIOWrapper(process.stdout) explicitly.

The second version does include universal_newlines and therefore I specify a bufsize.

In general, It is not necessary to specify bufsize if you use universal_newlines (you may but it is not required). And you don't need to specify bufsize in your case. bufsize=1 enables line-bufferred mode (the input buffer is flushed automatically on newlines if you would write to process.stdin) otherwise it is equivalent to the default bufsize=-1.