Search code examples
pythonlinuxpython-2.7multiprocessingtee

Pipe the output of a python script and its spawned processes to textfile in Linux


I have an optimisation that outputs status information as it completes. I am spawning a few different processes using Python's multiprocessing library to complete the work in parallel but unfortunately they will occasionally throw an exception of which I want to log along with the general status information.

I currently am using tee to pipe the output from the main process to a textfile as follows however none of the print statements from the other processes are going to here as well (which I understand as being because they are different processes and have a different std out).

python optimisation.py | tee output.txt & disown

How can I ensure all of the outputs end up in a single textfile? I don't care about whether lines overlap or so forth.


Solution

  • In Linux (and every Unix-like system), the shell always provides three open file descriptors to each process. The file descriptors are numbered 0, 1 and 2, and conventionally labeled "standard input" (often shortened as "stdin" -- though that is originally a C library symbol identifying a buffered I/O structure conventionally connected to standard input), "standard output" (stdout) and "standard error" (stderr). (I'll use the short versions below for brevity.)

    By default, those file descriptors are all made to reference the terminal input and output. When you use the vertical bar (|) syntax, the shell creates a unidirectional interprocess communication mechanism called a pipe and "wires" the stdout of the first command to the write end of the pipe, and the stdin of the second command to the read end of the pipe. The file descriptors not mentioned will retain their defaults (i.e. the stdin of the first command, the stdout of the second, and the stderr of both commands are still connected to the terminal).

    Conventionally, Linux/Unix programs write error messages and other diagnostic information to the stderr descriptor. When there is no stdout redirection going on, this seems indistinguishable from stdout (since both are connected to the terminal). But when stdout is redirected to a pipeline, stderr is not affected; it is still connected to the terminal.

    In your case, you are connecting the read end of the pipe to tee which duplicates its stdin (here receiving the stdout of your python script) to a file and to its own stdout (still the terminal in this case). When your python script throws an exception, the python interpreter prints the exception info and stack trace to its stderr. Since you haven't redirected the stderr, it's going to the terminal. But since you're also sending a copy of the script's stdout to the terminal (via tee), it's not obvious that one set of output is going to the terminal via tee and one is going there directly. But of course tee is never seeing the exception info.

    So there are two simple things you can do to save the exception info as well. First, you can save the exception info to a separate file:

    python optimisation.py 2>error.txt | tee output.txt & disown
    

    The syntax "2>error.txt" tells the shell to redirect file descriptor 2 (stderr) to the file named "error.txt".

    Second, if you prefer the exception info to be saved inline with the stdout of the script, use this syntax:

    python optimisation.py |& tee output.txt & disown
    

    The "|&" syntax specifies that both stdout and stderr should be directed to the pipe. (There are actually a number of different syntaxes that can achieve this. This one is briefest.)