Search code examples
bashpipetailtee

Incorrect results with bash process substitution and tail?


Using bash process substitution, I want to run two different commands on a file simultaneously. In this example it is not necessary but imagine that "cat /usr/share/dict/words" was a very expensive operation such as uncompressing a 50gb file.

cat /usr/share/dict/words | tee >(head -1 > h.txt) >(tail -1 > t.txt) > /dev/null

After this command I would expect h.txt to contain the first line of the words file "A", and t.txt to contain the last line of the file "Zyzzogeton".

However what actually happens is that h.txt contains "A" but t.txt contains "argillaceo" which is about 5% into the file.

Why does this happen? It seems like either the "tail" process is terminating early or the streams are getting mixed up.

Running another similar command like this behaves as expected:

cat /usr/share/dict/words | tee >(grep ^a > a.txt) >(grep ^z > z.txt) > /dev/null

After this command I'd expect a.txt to contain all the words that begin with "a", while z.txt contains all of the words that begin with "z", which is exactly what happened.

So why doesn't this work with "tail", and with what other commands will this not work?


Solution

  • Ok, what seems to happen is that once the head -1 command finishes it exits and that causes tee to get a SIGPIPE it tries to write to the named pipe that the process substitution setup which generates an EPIPE and according to man 2 write will also generate SIGPIPE in the writing process, which causes tee to exit and that forces the tail -1 to exit immediately, and the cat on the left gets a SIGPIPE as well.

    We can see this a little better if we add a bit more to the process with head and make the output both more predictable and also written to stderr without relying on the tee:

    for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
    

    which when I run it gave me the output:

    1
    Head done
    2
    

    so it got just 1 more iteration of the loop before everything exited (though t.txt still only has 1 in it). If we then did

    echo "${PIPESTATUS[@]}"
    

    we see

    141 141
    

    which this question ties to SIGPIPE in a very similar fashion to what we're seeing here.

    The coreutils maintainers have added this as an example to their tee "gotchas" for future posterity.

    For a discussion with the devs about how this fits into POSIX compliance you can see the (closed notabug) report at http://debbugs.gnu.org/cgi/bugreport.cgi?bug=22195

    If you have access to GNU version 8.24 they have added some options (not in POSIX) that can help like -p or --output-error=warn. Without that you can take a bit of a risk but get the desired functionality in the question by trapping and ignoring SIGPIPE:

    trap '' PIPE
    for i in {1..30}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done") >(tail -1 > t.txt) >/dev/null
    trap - PIPE
    

    will have the expected results in both h.txt and t.txt, but if something else happened that wanted SIGPIPE to be handled correctly you'd be out of luck with this approach.

    Another hacky option would be to zero out t.txt before starting then not let the head process list finish until it is non-zero length:

    > t.txt; for i in {1..10}; do echo "$i"; echo "$i" >&2; sleep 1; done | tee >(head -1 > h.txt; echo "Head done"; while [ ! -s t.txt ]; do sleep 1; done) >(tail -1 > t.txt; date) >/dev/null