performance optimization statistics julia

Julia: what is the peformance difference between for-loop and map in mathematical optimization?

I have developed two code implementations to optimize a quadratic function. One utilizes a for loop, while the other employs a map function. Despite the similarity in appearance between the two codes, the one using the map function is significantly faster, approximately 40 times faster.

I'm unsure about the reasons behind this speed difference. Could someone please explain this to me? Additionally, can you provide guidance on how to achieve a similar level of speed using the for loop?

using Optim

a = randn(5000000)
b = repeat([0, 1], inner = 2500000)

function f(a, b, x)
    obj = (x - a)^2 + b
    return obj
end

result_1 = []
@time result_1 = map(a, b) do ai, bi
    param_q = optimize(x -> f(ai, bi, x), -10.0, 10.0)
    param_q = Optim.minimizer(param_q)
    return param_q
end;
# 1.692008 seconds (10.17 M allocations: 810.298 MiB, 53.02% gc time, 10.49% compilation time)

result_2 = []
@time for i in 1:5000000
    param_q = optimize(x -> f(a[i], b[i], x), -10.0, 10.0)
    param_q = Optim.minimizer(param_q)
    result_2 = push!(result_2, param_q)
end

# 64.358054 seconds (1.29 G allocations: 20.025 GiB, 42.10% gc time, 0.15% compilation time)

Solution

Pre-allocation:

One major difference comes from allocations: memory allocations are costly, and it's better to ask for all the memory you need in one go (which is called pre-allocating), instead of making repeated requests for memory allocation.

When you use map, it internally allocates the whole results array at once, and then assigns the results of the do-block into that pre-allocated array, which is pretty fast.

In contrast, in your for loop, you're starting with a result_2 = [] and then push!-ing into that. This means that the memory isn't pre-allocated, and has to be requested again and again. This is one big part of the slowdown.

Lack of type specification:

Another major reason is incorrect type specification of the results array. [] doesn't have anything indicate what types of elements the array will hold, so result_2 = [] makes results_2 a Vector{Any} - a vector that can hold anything. The price for this flexibility is that storing (and retrieving) the values in memory isn't as simple anymore, which makes operating on this array much slower.

Faster loop:

Implementing those changes, the loop becomes:

#pre-allocating to the full size, and specifying the 
# array element type as Float64
results_2 = Vector{Float64}(undef, length(a))
for i in 1:5000000
    param_q = optimize(x -> f(a[i], b[i], x), -10.0, 10.0)
    results_2[i] = Optim.minimizer(param_q)
end

This should be pretty close the speed of the map. I'd also recommend using for i in eachindex(a, b) instead of for i in 1:5000000 - it's more convenient in case the length of a changes in the future, and doing it as eachindex(a, b) also ensures that a and b both have the same length, avoiding one potential source of mistake when you change things around in the future.

(Once you change the loop to use eachindex, you can also put an @inbounds marker in front of the loop, to guarantee to Julia that none of the array accesses go out-of-bounds, and tell it to skip array bounds-checking. This may not be worth doing here though, since the time taken for bounds checking is miniscule compared to the time spent in the optimization routine.)