Search code examples
linuxbashlsdirtee

A confusion about ls, dir and tee


I know that tee will read from STDIN and create a new file. But when it comes with ls, which process happens first?

For example:

➤ ls
12  123  1234
➤ ls | tee hello
12
123
1234
hello # ls catch hello
➤ ls | tee 000
12
123
1234
hello # ls didn't get 000
➤ ls | tee 0
000
12
123
1234
hello # ls didn't get 0
➤ ls | tee 00000
0
000
00000 # ls did get 00000
12
123
1234
hello
➤ 

but when it comes to dir:

➤ ls
12  123  1234
➤ dir | tee hello
12  123  1234  hello # get hello
➤ dir | tee 000
000  12  123  1234  hello
➤ dir | tee 0
0  000  12  123  1234  hello #get 0
➤ dir | tee 000000
0  000  12  123  1234  hello # didn't get 00000
➤ dir | tee 01
0  000  000000  01  12  123  1234  hello
➤ dir | tee 000000000000000000000000
0  000  000000  000000000000000000000000  01  12  123  1234  hello #get 00000000..000
➤ 

WHY? Which happens first? tee create a new file or ls/dir output?


Solution

  • This is actually the case of a process-race-condition on a directory-resource since the two processes are executed in parallel.

    Each command in a pipeline is executed as a separate process (i.e., in a sub-shell).

    The idea of pipeline is that output from process A associated with executable exec_A is redirected to process B associated with executable exec_B :

    exec_A | exec_B

    How this is done is largely implementation dependent but given pragmatic limitations the operating system would have to create a buffer to place the output of A and force B to read from that buffer. This happens before processes start.

    So what happens is something like:

    exec_A &> buf ; exec_B < buf &

    What the processes do internally with the data they receive or write depends on the implementation of the process. In this case tee is creating the file that is going to write at process start, which is absolutely logical as it needs to append incoming data.

    Given that, it depends on if process A ( i.e. ls/dir ) completes its directory transversal before process B has opened the file. Which is actually dependant on who obtains the lock on the resource's parent.

    You can actually observe that ls will almost always output a resource that is created as such:

    ls * | tee subdir/0

    because it obtains the lock on subdir late.