Search code examples
performancejuliamatrix-multiplication

Optimizing matrix multiplication with varying sizes


Suppose I have the following data generating process

using Random
using StatsBase

m_1    = [1.0 2.0]
m_2    = [1.0 2.0; 3.0 4.0]
DD     = []
y      = zeros(2,200)

for i in 1:100
    rand!(m_1)
    rand!(m_2)
    push!(DD, m_1)
    push!(DD, m_2)
end

idxs   = sample(1:200,10)
for i in idxs
    DD[i] = DD[1]
end

and suppose given the data, I have the following function


function test(y, DD, n)
    v_1   = [1 2]
    v_2   = [3 4]
    for j in 1:n
        for i in 1:size(DD,1)
            if size(DD[i],1) == 1
                y[1:size(DD[i],1),i] .= (v_1 * DD[i]')[1]
            else
                y[1:size(DD[i],1),i] = (v_2 * DD[i]')'
            end
        end
    end
end

I'm struggling to optimize the speed of test. In particular, memory allocation increases as I increase n. However, I'm not really allocating anything new.

The data generating process captures the fact that I don't know for sure the size of DD[i] beforehand. That is, the first time I call test, DD[1] could be a 2x2 matrix. The second time I call test, DD[1] could be a 1x2 matrix. I think this could be part of the issue with memory allocation: Julia doesn't know the sizes beforehand.

I'm completely stuck. I've tried @inbounds but that didn't help. Is there a way to improve this?


Solution

  • One important thing to check for performance is that Julia can understand the types. You can check this by running @code_warntype test(y, DD, 1), the output will make it clear that DD is of type Any[] (since you declared it that way). Working with Any can incur quite a performance penalty so declaring DD = Matrix{Float64}[] cuts the time to a third in my testing.

    I'm not sure how close this example is to the actual code you want to write but in this particular case the size(DD[i],1) == 1 branch can be replaced by a call to LinearAlgebra.dot:

    y[1:size(DD[i],1),i] .= dot(v_1, DD[i])
    

    this cuts the time by another 50% for me. Finally you can squeeze out just a tiny bit more by using mul! to perform the other multiplication in place:

    mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2')
    

    Full example:

    using Random
    using LinearAlgebra
    
    DD = [rand(i,2) for _ in 1:100 for i in 1:2]
    y  = zeros(2,200)
    shuffle!(DD)
    
    function test(y, DD, n)
        v_1   = [1 2]
        v_2   = [3 4]'
        for j in 1:n
            for i in 1:size(DD,1)
                if size(DD[i],1) == 1
                    y[1:size(DD[i],1),i] .= dot(v_1, DD[i])
                else
                    mul!(view(y, 1:size(DD[i],1),i:i), DD[i], v_2)
                end
            end
        end
    end