Search code examples
bashmultiprocessinggnu-parallel

Using GNU Parallel for cluster computing over LAN with rsync


I have two machines, and I want to use GNU Parallel to have multiple processes 'cat' the contents of some text files from both machines.

I have the following setup.

On a local machine, in the same directory, I have the following files:

  • cmd.sh - a bash file with contents: 'cat "$@"'
  • test1.txt - a text file with contents: 'Test 1'
  • test2.txt - a text file with contents: 'Test 2'
  • test3.txt - a text file with contents: 'Test 3'
  • nodefile - a text file with the following contents:

    2/:

    4/ [email protected]

This is if I am using the nodefile example from wordpress link (below), and my IP is 192.168.0.2.

None of these files are replicated on the remote machine. I want to have multiple processes 'cat' the contents of each of the test?.txt files from both machines.

Preferably, this:

  • Wouldn't leave any artifacts on the remote machine
  • Would leave the contents of the local directory intact.

I have been able to execute multiprocessing commands remotely with the nodefile as per this wordpress example, but none involving file echoing remotely.

So far, I have something like the following:

parallel --sshloginfile nodefile --workdir . --basefile cmd.sh -a cmd.sh --trc ::: test1.txt test2.txt test3.txt

But this isn't working and is removing the files from my directory and not replacing them, as well as giving rsync errors. I (unfortunately) can't provide the errors at the moment, or replicate the setup.

I am very inexperienced with parallel, can anyone guide me on the syntax to accomplish this task? I haven't been able to find the answer (so far) in the man pages or on the web.

Running Ubuntu 16.04 LTS and using latest version of GNU Parallel.


Solution

  • You make a few mistakes:

    • -a is used to give an input source. It is basically an alias for ::::
    • you do not give the command to run after the options to GNU Parallel and before the :::
    • --trc takes an argument (namely the file to transfer back). You do not have a file to transfer back, so use --transfer --cleanup instead.

    So:

    chmod +x cmd.sh
    parallel --sshloginfile nodefile --workdir . --basefile cmd.sh --transfer --cleanup ./cmd.sh ::: test1.txt test2.txt test3.txt
    

    It is unclear if you want to transfer anything to the remote machine, so maybe this is really the correct answer:

    parallel --sshloginfile nodefile --nonall --workdir . ./cmd.sh test1.txt test2.txt test3.txt