GNU parallel arguments

From the example

seq 1 100 | parallel -I @@ \ > 'mkdir top-@@;seq 1 100 | parallel -X mkdir top-@@/sub-{}

How do -X , @@, {} work? Also, what will be the behavior when '1' or '.' is passed inside {}? Is /> used for redirection here?

I was trying to go through the tutorial from https://www.youtube.com/watch?v=P40akGWJ_gY&list=PL284C9FF2488BC6D1&index=2 and reading through man parallel page. I am able to gather some basic knowledge but not exactly how to use it or as such.

Solution

Let's do the easy stuff first.

The backslash (\) is just telling the shell that the following line is a continuation of the current one, and the greater than sign (>) is the shell prompting for the continuation line. It is no different from typing:

echo \
hi

where you will actually see this:

echo \
> hi
hi

So, I am saying you can ignore \> and just run the command on a single line.

Next, the things in {}. These are described in the GNU Parallel manual page, but essentially:

{1} refers to the first parameter
{2} refers to the second parameter, and so on

Test this with the following where the column separator is set to a space but we use the parameters in the reverse order:

echo A B | parallel --colsep ' ' echo {2} {1}
B A

{.} refers to a parameter, normally a filename, with its extension removed

Test this with:

echo fred.dat | parallel echo {.}
fred

Now let's come to the actual question, with the continuation line removed as described above and with everything on a single line:

seq 1 100 | parallel -I @@ 'mkdir top-@@;seq 1 100 | parallel -X mkdir top-@@/sub-{}'

So, this is essentially running:

seq 1 100 | parallel -I @@ 'ANOTHER COMMAND'

Ole has used @@ in place of {} in this command so that the substitutions used in the second, inner, parallel command don't get confused with each other. So, where you see @@ you just need to replace it with the values from first seq 1 100.

The second parallel command is pretty much the same as the first one, but here Ole has used X. If you watch the video you link to, you will see that he previously shows you how it works. It actually passes "as many parameters as possible" to a command according to the system's ARGMAX. So, if you want 10,000 directories created, instead of this:

seq 1 10000 | parallel mkdir {}

which will start 10,000 separate processes, each one running mkdir, you will start one mkdir but with 10,000 parameters:

seq 1 10000 | parallel -X mkdir

That avoids the need to create 10,000 separate processes and speeds things up.

Let's now look at the outer parallel invocation and do a dry run to see what it would do, without actually doing anything:

seq 1 100 | parallel -k --dry-run -I @@ 'mkdir top-@@;seq 1 100 | parallel -X mkdir top-@@/sub-{}'

Output

mkdir top-1;seq 1 100 | parallel -X mkdir top-1/sub-{}
mkdir top-2;seq 1 100 | parallel -X mkdir top-2/sub-{}
mkdir top-3;seq 1 100 | parallel -X mkdir top-3/sub-{}
mkdir top-4;seq 1 100 | parallel -X mkdir top-4/sub-{}
mkdir top-5;seq 1 100 | parallel -X mkdir top-5/sub-{}
mkdir top-6;seq 1 100 | parallel -X mkdir top-6/sub-{}
mkdir top-7;seq 1 100 | parallel -X mkdir top-7/sub-{}
mkdir top-8;seq 1 100 | parallel -X mkdir top-8/sub-{}
...
...
mkdir top-99;seq 1 100 | parallel -X mkdir top-99/sub-{}
mkdir top-100;seq 1 100 | parallel -X mkdir top-100/sub-{}

So, now you can see it is going to start 100 processes, each of which will make a directory then start 100 further processes that will each create 100 subdirectories.