Search code examples
parallel-processingjuliadifferential-equationsdifferentialequations.jl

How can I run a simple parallel array assignment operation in Julia?


I have to solve a differential equations system many times, iterating over a parameter. For this, I run a loop over a list of the parameter, and store the solution (evaluated at an array of time values) for each parameter. So I have a 2D array in which I store solutions (each row is for a value of the parameter).

Now, since any iteration has nothing to do with another one, I thought of doing this in parallel.

Here is my code:

using DifferentialEquations
using SharedArrays
using DelimitedFiles
using Distributed

function tf(x,w)
    return x*sin(w*x)
end

function sys!(dv,v,w,t)
    dv[1] = w*v[1]
    dv[2] = tf(v[1],w)
end

times = LinRange(0.1,2,25)

params = LinRange(0.1,1.2,100)

sols = SharedArray{Float64,2}((length(times),length(params)))

@distributed for i=1:length(params)
    println(i)
    init_val = [1.0,1.0]
    tspan = (0.0,2.0)
    prob = ODEProblem(sys!,init_val,tspan,params[i])
    sol = solve(prob)
    sols[:,i] .= sol(times)[2,:]
end

writedlm("output.txt",sols)

Now, when I run this without the @distributed prefixed to the loop, this runs perfectly.

When I run this code, however, the println statement does not work, and although the file "output.txt" is stored, it is full of zeros.

I'm running this code from the command line this way

julia -p 4 trycode.jl

This shows no output and just works for a minute and does nothing, although the file "output.txt" is stored. It's as if the loop is never entered.

I would really appreciate some help on how to set up this simple parallel loop.


Solution

  • As Bill says there's two main ways to think about parallelism in Julia: the threaded model, which was introduced in Julia 1.3 and does shared memory parallelism via the Threads.@threads macro, and distributed processing using the Distributed.@distributed macro, which parallelizes across different Julia processes.

    Threads is definitely closer to an "automagic" parallelism speed-up with minimal or no re-write of code and often a great option, although one has to take care to ensure that whatever operation one is running is threadsafe, so do always check that results come out the same.

    As your question was about @distributed parallelism originally, let me answer that as well. If you do @distributed parallelism, the simplest mental model (I believe) to think about what's going on is to imagine you're running your code in entirely seperate Julia REPLs.

    Here's a version of your code adapted to the @distributed model:

    using Distributed
    addprocs(2)
    
    using SharedArrays
    using DelimitedFiles
    
    @everywhere begin 
        using DifferentialEquations
    
        tf(x,w) = x*sin(w*x)
    
        function sys!(dv,v,w,t)
            dv[1] = w*v[1]
            dv[2] = tf(v[1],w)
        end
    
        times = LinRange(0.1,2,25)
        params = LinRange(0.1,1.2,100)
    end
    
    sols = SharedArray{Float64,2}((length(times),length(params)))
    
    @sync @distributed for i=1:length(params)
        println(i)
        init_val = [1.0,1.0]
        tspan = (0.0,2.0)
        prob = ODEProblem(sys!,init_val,tspan,params[i])
        sol = solve(prob)
        sols[:,i] .= sol(times)[2,:]
    end
    
    sols
    

    What has changed?

    • I added addprocs(2) at the start of the script. This isn't necessary if you start Julia with p -2 (or whatever number of processes you want) as you do it, but I often find it easier to reason about the code when it explicitly sets up the parallel environment in the code directly. Note that this isn't currently possible for threads, i.e. you need to set your JULIA_NUM_THREADS environment variable before starting Julia and can't change the number of threads once you're up and running.

    • I then moved bits of the code into a @everywhere begin ... end block. This essentially runs the operations enclosed in the block on all processes at the same time. Going back to the mental model of running separate Julia instances, you have to look at what's in your @distributed loop and make sure that all functions and variables are actually defined on all processes. So e.g. to ensure that every process knows what ODEProblem is, you need to do using DifferentialEquations on all of them.

    • Finally, I added @sync to the distributed loop. This is referenced in the docs for the @distributed. Running @distributed macro with for loop spawns an asynchronous green thread (Task) handle for the distributed execution and moves forward to the next line. Since you want to wait until the execution actually get done the synchronization @sync is required. The issue with your original code is that without waiting for the green thread to complete (synchronizing), it will swallow errors and just return immediately, which is why your sol array is empty. You can see this if you run your original code, and only add the @sync - you will then get a TaskFailedException: on worker 2 - UndefVarError: #sys! not defined which tells you that your worker processes don't know about the functions you defined on the master process. In practice you will almost always want @sync execution, unless you you plan to run many such distributed loops in parallel. You also do not need the @sync keyword where you use aggregator function in the distributed loop (the @distributed (func) for i in 1:1000 form of the loop)

    Now what's the best solution here? The answer is I don't know. @threads is a great option to quickly parallelize thread-safe operation without rewriting the code, and is still actively being developed and improved so likely to become even better in future. There is also pmap in the Distributed standard library which gives you additional options, but this answer is long enough as it is! In my personal experience, nothing replaces (1) thinking about your problem and (2) benchmarking execution. Things you want to think about are the runtime of your problem (both total and for every individual operation that you want to distribute) and message passing/memory access requirements.

    The upside is that while you might have to spend a bit of effort thinking about things, Julia has a bunch of great options to make the most out of every hardware situation from a crappy old laptop with two cores (like the one I'm typing this from) to multi-node super high performance clusters (which has made Julia one of the very few programming language to achieve petaflop performance - although to be fair this is a little more tricky then my or Bill's answer :))