So I have a moderately complex set of requirements for my worker processes. I want to use a the master slave topology, and a nondefault working directory. I also want to mix both local and remote workers.
As far as I can tell from readying the --machine-file
section of the documentation.
It will not let me do that.
So I am looking at the -L <file
parameter
>julia -h
...
-L, --load Load immediately on all processors
...
So if I do not use the -p
or --machine-file` flags, then there is initially only one processer so the all processors just mean on the only processor.
So I tried this out
addprocs([
("cluster_c4_1",:auto),
("cluster_c4_2",:auto)
],
dir="/mnt/",
topology=:master_slave
)
addprocs(
dir="/mnt/",
topology=:master_slave
)
println("*************")
println(workers())
println("-------------")
>julia -L start_workers.jl pl.jl
*************
[2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
-------------
So it looks all good, got my 20 workers. Have I done anything unreasonable? Is this the best way?
That's exactly how I'm deploying it on a HPC cluster under Torque scheduler. In fact I'm in the process of re-writing the the cluster manager to support more options when adding processes through the Torque scheduling systems in particular, so I've spent quite a bit of time looking into this.
You might also want to be aware there are various ClusterManagers, Pkg.add("ClusterManagers")
that extend the ability of addprocs under a variety of environments, such as when you need to request the resources from a scheduler. It looks like passwordless ssh is possible for you, so the default cluster manager is sufficient in your case.
I don't believe there is any way of defining the extra topology and directory parameters on the command line, so your approach is correct.