Search code examples
bashcatreplicate

bash: how to copy multiple copies of one file into another fast?


I need to stress-test a program by feeding it input files of progressively larger sizes. I have an input file inputSmall.txt which I want to replicate N times and cat those copies into the same file. N is large file. If there anything that would work faster than the following simple loop (e.g. N=1000):

for i in {1..1000}
do 
    cat inputSmall.txt >> input1000.txt
done

My machine has enough disk space to store inputN.txt for very large Ns and has a lot of RAM, in case it's relevant.

Thx


Solution

  • cat is an external command, rather than being part of the shell; like all external commands, starting it up has a significant overhead. Similarly, running >>input1000.txt is a fairly expensive filesystem operation -- looking up the inode associated with a directory, opening it, and then (on leaving scope) flushing contents and closing the file.

    It's much more efficient to do these things only once.


    Assuming that the final line of inputSmall.txt ends in a newline, the following will work correctly, and with far less overhead:

    in=$(<inputSmall.txt)        # read the input file only once
    exec 3>>input1000.txt        # open the output file only once
    
    for ((i=0; i<1000; i++)); do
      printf '%s\n' "$in" >&3    # write the input from memory to the output fd
    done
    exec 3>&-                    # close the output fd