I need to stress-test a program by feeding it input files of progressively larger sizes. I have an input file inputSmall.txt
which I want to replicate N
times and cat
those copies into the same file. N
is large file. If there anything that would work faster than the following simple loop (e.g. N=1000
):
for i in {1..1000}
do
cat inputSmall.txt >> input1000.txt
done
My machine has enough disk space to store inputN.txt
for very large N
s and has a lot of RAM, in case it's relevant.
Thx
cat
is an external command, rather than being part of the shell; like all external commands, starting it up has a significant overhead. Similarly, running >>input1000.txt
is a fairly expensive filesystem operation -- looking up the inode associated with a directory, opening it, and then (on leaving scope) flushing contents and closing the file.
It's much more efficient to do these things only once.
Assuming that the final line of inputSmall.txt
ends in a newline, the following will work correctly, and with far less overhead:
in=$(<inputSmall.txt) # read the input file only once
exec 3>>input1000.txt # open the output file only once
for ((i=0; i<1000; i++)); do
printf '%s\n' "$in" >&3 # write the input from memory to the output fd
done
exec 3>&- # close the output fd