Search code examples
juliadistributed-computingcompiler-flags

Is using the -L flag and a addprocs script the more powerful version of -p and --machinefile?


So I have a moderately complex set of requirements for my worker processes. I want to use a the master slave topology, and a nondefault working directory. I also want to mix both local and remote workers.

As far as I can tell from readying the --machine-file section of the documentation. It will not let me do that.

So I am looking at the -L <file parameter

>julia -h
...
-L, --load Load immediately on all processors
...

So if I do not use the -p or --machine-file` flags, then there is initially only one processer so the all processors just mean on the only processor.

So I tried this out

start_workers.jl

addprocs([
          ("cluster_c4_1",:auto),
          ("cluster_c4_2",:auto)
    ],
        dir="/mnt/",
        topology=:master_slave
        )

addprocs(
        dir="/mnt/",
        topology=:master_slave
        )

test.jl

println("*************")
println(workers())
println("-------------")

Running it:

>julia -L start_workers.jl pl.jl 
*************
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
-------------

So it looks all good, got my 20 workers. Have I done anything unreasonable? Is this the best way?


Solution

  • That's exactly how I'm deploying it on a HPC cluster under Torque scheduler. In fact I'm in the process of re-writing the the cluster manager to support more options when adding processes through the Torque scheduling systems in particular, so I've spent quite a bit of time looking into this.

    You might also want to be aware there are various ClusterManagers, Pkg.add("ClusterManagers") that extend the ability of addprocs under a variety of environments, such as when you need to request the resources from a scheduler. It looks like passwordless ssh is possible for you, so the default cluster manager is sufficient in your case.

    I don't believe there is any way of defining the extra topology and directory parameters on the command line, so your approach is correct.