Search code examples
sshgnu-parallelpssh

How to use parallel on multiple computer to have a list of tasks done only once?


I am trying to use parallel on multiple server using ssh, what actually I would like to do is something like:

    parallel -s computer_list.txt < command.txt

where in server_list.txt there is list of server and in command.txt looks like

    fist_job.sh
    second_job.sh
    ...

But I don't want that all the server do all the jobs in the list, I want that each *.sh is executed just one time on a random server, all of them can reach all files that they need to execute each command.

In other words what I am looking for is a kind of generalization of:

   parallel < command.txt

Solution

  • I guess you could do something like this:

    servers.txt

    server1
    server2
    server3
    server4
    serverA
    serverB
    raspi10
    raspi11
    raspi12
    raspi13
    supercomputerA
    supercomputerB
    

    jobs.txt

    job1
    job2
    job3
    job4
    job5
    job6
    

    Then use this bash script:

    #!/bin/bash
    
    # Read in list of jobs into array
    jobs=( $(<jobs.txt) )
    
    # Get randomised list of servers
    servers=( $( gshuf servers.txt) )
    
    # Assign each job to a server and execute in parallel
    for ((i==0;i<${#jobs[@]};i++)) ; do
       echo "ssh \"${servers[i]}\" \"${jobs[i]}\""
    done | parallel
    

    Example

    That generates the following input for GNU Parallel:

    ssh "raspi12" "job1"
    ssh "serverA" "job2"
    ssh "serverB" "job3"
    ssh "raspi13" "job4"
    ssh "server3" "job5"
    ssh "supercomputerB" "job6"
    

    Notes:

    gshuf is how GNU shuf (shuffle) is installed on a Mac. It may be known as shuf on other machines.