multithreading parallel-processing julia sungridengine

Getting Julia SharedArrays to play nicely with Sun Grid Engine

I have been trying to get a Julia program to run correctly in an SGE environment with SharedArrays. I read several threads on Julia and SGE, but most of them seem to deal with MPI. The function bind_pe_procs() from this Gist seems to correctly bind the processes to a local environment. A script like

### define bind_pe_procs() as in Gist
### ...
println("Started julia")
bind_pe_procs()
println("do SharedArrays initialize correctly?")
x = SharedArray(Float64, 3, pids = procs(), init = S -> S[localindexes(S)] = 1.0)
pids = procs(x)
println("number of workers: ", length(procs()))
println("SharedArrays map to ", length(pids), " workers")

yields the following output:

starting qsub script file
Mon Oct 12 15:13:38 PDT 2015
calling mpirun now 
exception on 2: exception on exception on 4: exception on exception on 53: : exception on exception on exception on Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-8.local","n"=>"5"}]compute-0-8.local
ASCIIString["compute-0-8.local","compute-0-8.local","compute-0-8.local","compute-0-8.local"]adding machines to current system
done
do SharedArrays initialize correctly?
number of workers: 5
SharedArrays map to 5 workers

Curiously, this doesn't seem to work if I need to load the array from a file and convert to SharedArray format with the command convert(SharedArray, vec(readdlm(FILEPATH))). If the script is

println("Started julia")
bind_pe_procs()

### script reads arrays from file and converts to SharedArrays
println("running script...")
my_script()

then the result is garbage:

starting qsub script file
Mon Oct 19 09:18:29 PDT 2015
calling mpirun now Started julia
parsing PE_HOSTFILE
[{"name"=>"compute-0-5.local","n"=>"11"}]compute-0-5.local
ASCIIString["compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0-5.local","compute-0- 5.local"]adding machines to current system
done
running script...
Current number of processes: [1,2,3,4,5,6,7,8,9,10,11]
SharedArray y is seen by [1] processes
### tons of errors here
### important one is "SharedArray cannot be used on a non-participating process"

so somehow the SharedArrays do not map correctly to all cores. Does anybody have any suggestions or insights into this problem?

Solution

One workaround that I have used in my work is to simply force SGE to submit a job to a particular node and then limit the parallel environment to the number of cores that I want to use.

Below I provide an SGE qsub script for a 24-core node where I want to use only 6 cores.

#!/bin/bash
# lots of available SGE script options, only relevant ones included below

# request processes in parallel environment 
#$ -pe orte 6 

# use this command to dump job on a particular queue/node
#$ -q all.q@compute-0-13

/share/apps/julia-0.4.0/bin/julia -p 5 MY_SCRIPT.jl

Pro: plays nicely with SharedArrays. Con: the job will wait in the queue until the node has sufficient cores available.