Search code examples
linuxsshbufferstdoutdata-integrity

Data integrity question when collecting STDOUTs from multiple remote hosts over SSH


Suppose you run the following commands:

ssh $host1 'while [ 1 ]; do sleep 1; echo "Hello from $HOSTNAME"; done' > /tmp/output
ssh $host2 'while [ 1 ]; do sleep 1; echo "Hello from $HOSTNAME"; done' >> /tmp/output
ssh $host3 'while [ 1 ]; do sleep 1; echo "Hello from $HOSTNAME"; done' >> /tmp/output

Then the output would look like:

Hello from host1
Hello from host2
Hello from host3
Hello from host1
...

But what if I changed it to

ssh $host1 'while [ 1 ]; do sleep 1; cat /some/large/file1.txt; done' > /tmp/output
ssh $host2 'while [ 1 ]; do sleep 1; cat /some/large/file2.txt; done' >> /tmp/output
ssh $host3 'while [ 1 ]; do sleep 1; cat /some/large/file3.txt; done' >> /tmp/output

so that stdout from each host won't fit into a single buffer? Would the data integrity of file[1-3].txt, and not the order, be maintained in this case? Is there a possibility that a file fragment of some other file slips in the middle of some other file like this?

[file1_fragment1] [file2_fragment1] [file1_fragment2] [file1_fragment3] [file3_fragment1] ...

Solution

  • I would say the possibility of that happening is pretty much 100% ;-) assuming the time taken to cat one file over the network is long.

    The data will be written to /tmp/output on the local system in approximately the same order that it is received. The shell doesn't know to hold on to data that comes from ssh command #2 or #3 until there's a break in #1, and besides, it will have no idea where the end of each iteration of file 1 comes.