Search code examples
linuxbashperformancehardlinkln

Accelerate/parallelize massive hard-linking loop


I'm working on a script that generates a folder containing several thousand binaries that are to be bundled into multiple installers for different products. One massive part of the script "copies" files from various arbitrary locations to a temporary path, to generate the desired output structure for the installer/tarball.

The single slowest part of this operation is a massive loop that essentially looks like:

for i in $(find /some/tmp/path -iname "*")
do
    ln "${i}" "/TEMP1/${i}"
done

The reason this is being done is that it is a faster way to "copy" files to the desired location to generate the final output tarball, and it doesn't use as much disk space as an actual copy of the file.

However, this loop is painfully slow (takes about 15 minutes to hard-link 30,000 files). Assuming this is running on a very powerful machine with a top-end SSD, would this be the sort of task that could be sped up greatly by wrapping it with a parallel-type tool, or just backgrounding all the ln operations, tracking the PID for the ln operation, and checking if all processes exited successfully at the end? Alternately, is there some other means to speed this up which I haven't considered?


Solution

  • This should do the trick.

    for i in $(find /some/tmp/path -iname "*")
    do
        ln "${i}" "/TEMP1/${i}" &
    done
    wait
    

    Let me know if it works. Regards!