Search code examples
hpcsnakemakegnu-parallel

snakemake (or parallel) for multiple machines over ssh


Say you have a snakemake file. This file produces something like 50000 jobs, however, these are small jobs that take a few seconds to run.

From the head node, you have access to multiple servers named:

machine01
machine02
machine03
machine04
machine05
machine06

To make matters more interesting, each machine has an uneven number of cores. What is the best way to send the different jobs to different machines for parallel execution? I tried the batch option in snakemake but it does not seem to be doing but I thought it did.


Solution

  • With GNU Parallel it could look like this:

    cat arguments | parallel --slf list-of-servers my_script
    

    If GNU Parallel is installed the servers, it will detect the number of cores on each server.