Search code examples
gnu-parallel

Ensuring all .sh curl download scripts download using gnu parallel


I'm executing the following command which executes a group of scripts with each script being a curl download.

parallel --resume-failed --joblog logshd.log {1} :::  SH/*.sh

The set of files downloaded is quite large. I've noticed some files don't download.

I hoped that the resume-failed parameter would ensure that all the downloads that fail resume and complete.

  1. I'm not clear on if that means I need to run the process again a second time or if that should occur when I run the one time.

From the gnu documentation

Where --resume-failed reads the commands from the command line (and ignores the commands in the joblog), --retry-failed ignores the command line and reruns the commands mentioned in the joblog.

  1. I'm not clear on what ignoring the command line or ignores the commands in the job log means. Could that be clarified.

  2. Can --resume-failed and --retry-failed be declared within the same command and if so what is the effect of that?

Regards Conteh


Solution

  • If we assume the download fails intermittently then your answer is --retries 10. It will run the command 10 times before giving up.

    --resume-failed and --retry-failed are both used when GNU Parallel has finished, and you then figure out that you want to retry some of the jobs again.

    The difference between the two is in how to retry the command.

    • --retry-failed will run exactly the same command as failed before. It does that by looking in the joblog for the command. This is typically what you want.
    • --resume-failed is used if you figure out that the failing command actually needed some other parameter: i.e. GNU Parallel should not run exactly the same command, but it should run a (typically slightly changed) command with the same parameters instead.