Search code examples
multidimensional-arrayjuliaatom-editordifferentialequations.jl

storing the solution of differential equation takes too long time


I am solving a differential equation whose solution comes up with size (2000,2000000), i.e., size(sol)=(2000,2000000). Now I want to store the time evolution of first 1000 variable to an array. for instance say,

      x=sol[1:1000,:];

and from this I need to calculate a value

    r1= abs.(mean(exp.(x*1im),dims=1)[1,:])[1:end]

Now storing the solution to x and then calculating r1 takes too much time and even something the system crash. Is there any solution to this ?

I am using Atom to run the code and the RAM of my system is 48 GB.


Solution

  • I am going to assume that the array sol is a Matrix{Float64}, though this would be something that is important to include in your question.

    Then I will point out some places where you are creating unnecessary allocations:

    x = sol[1:1000,:]
    r1 = abs.(mean(exp.(x*1im),dims=1)[1,:])[1:end]
    

    Starting from the innermost:

    1. x*1im creates a full new array of the same size as x.
    2. exp.() creates another full array of the same size as x.
    3. mean creates another array, though that is a reasonable thing to expect.
    4. [1,:] creates a new array the same size as one row of x
    5. [1:end] also creates a copy of the output vector, and this doesn't seem to have any purpose at all.
    6. x = sol[1:1000, :] allocates a large array.

    We want to get rid of most of these:

    1. (and 2.)\ You could write exp.(x .* 1im) with a dot after x, but it is more efficient to use the cis function, which does the same thing as exp(1im * x)

    1. Broadcasting is nice, but since you want to just average along the columns, there's no need for a temp array, just write mean(cis, x; dims=1) instead, that does not create the temporary.
    2. You can use the vec function to force a vector from a matrix without copying data.
    3. Just remove this.
    4. Try making a view to avoid copying.

    Result:

    # original
    function absmean(sol)
        x = sol[1:1000,:]
        r1 = abs.(mean(exp.(x*1im),dims=1)[1,:])[1:end]
        return r1
    end
    
    # new version
    function absmean2(sol)
        x = @view sol[1:1000, :]
        r1 = abs.(mean(cis, x; dims=1)) |> vec
        return r1
    end
    

    Benchmark:

    # making a smaller version to reduce benchmarking time
    julia> sol = randn(2000, 2000);
    
    julia> @btime absmean($sol);
      72.293 ms (18 allocations: 76.39 MiB)
    
    julia> @btime absmean2($sol);
      29.985 ms (15 allocations: 47.88 KiB)
    

    Not that much faster, but less than 1/1000th of the memory use. There are many more things you can do to improve performance, but these were the 'low-hanging fruit'.

    Final remark. It seems like your solution data are oriented along the rows of sol instead of along the columns. That is a bit strange, since Julia arrays are column-major. It doesn't matter in this particular calculation, but in other cases it could be detrimental to performance.

    Edit: I'll just add a version to show how you can get more dramatic speedups. This uses LoopVectorization.jl with multithreading (6 cores, 12 threads). Some re-writing of the code is necessary to get LoopVectorization to accept the code:

    using LoopVectorization
    
    function absmean4(sol)
        x = @view sol[1:1000, :]
        r = similar(x, size(x, 2))
        # use @turbo instead of @tturbo for single-threaded
        @tturbo for j in axes(x, 2)  
            s = zero(eltype(x))
            c = zero(eltype(x))
            for i in axes(x, 1)
                (a, b) = sincos(x[i, j])
                s += a
                c += b
            end
            r[j] = sqrt(s^2 + c^2) / size(x, 1)
        end
        return r
    end
    
    julia> @btime absmean4($sol);
      1.830 ms (1 allocation: 15.75 KiB)
    

    That's a 40x speedup vs the original, and 5000x less memory use.

    If you want to go absolutely all out, you can pre-allocate the output vector r and pass it into your function. Then you will have zero allocations.