Slow multiplication of transpose of sparse matrix

I'm having speed issues multiplying the transpose of a sparse matrix with a column vector.

In my code the matrix A is

501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries:

⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⢸

⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠈⠻⣾

These are the results I get from the multiplication with f0 = rand(Float64,501,1):

Method 1

A_tr = transpose(A)

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 1 evaluation.

Range (min … max): 350.083 μs … 9.066 ms ┊ GC (min … max): 0.00% … 95.44%

Time (median): 361.208 μs ┊ GC (median): 0.00%

Time (mean ± σ): 380.269 μs ± 355.997 μs ┊ GC (mean ± σ): 4.06% ± 4.15%

Memory estimate: 218.70 KiB, allocs estimate: 11736.

Method 2

A_tr = Matrix(transpose(A))

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 1 evaluation.

Range (min … max): 87.375 μs … 210.875 μs ┊ GC (min … max): 0.00% … 0.00%

Time (median): 88.542 μs ┊ GC (median): 0.00%

Time (mean ± σ): 89.286 μs ± 3.266 μs ┊ GC (mean ± σ): 0.00% ± 0.00%

Memory estimate: 4.06 KiB, allocs estimate: 1.

Method 3

A_tr = sparse(Matrix(transpose(A)))

@benchmark A_tr*f

BenchmarkTools.Trial: 10000 samples with 9 evaluations.

Range (min … max): 2.102 μs … 1.017 ms ┊ GC (min … max): 0.00% … 99.40%

Time (median): 2.477 μs ┊ GC (median): 0.00%

Time (mean ± σ): 2.725 μs ± 13.428 μs ┊ GC (mean ± σ): 6.92% ± 1.41%

Memory estimate: 4.06 KiB, allocs estimate: 1.

Why doesn't Method 1 produce a similar performance as Method 3? I'm probably missing something basic here.

Thank you for your help!

Solution

using the following MWE

using LinearAlgebra, BenchmarkTools, SparseArrays

A = sprand(501,501,0.005)
At1 = transpose(A)
At2 = sparse(Matrix(transpose(A)))
f = rand(Float64,501,1)

you will find no significant performance difference between

@benchmark $At1*$f

and

@benchmark $At2*$f

As was pointed out by @SGJ the trick is to have a primitive type as parameter for your container, i.e. SparseMatrixCSC{Float64, Int64} instead of SparseMatrixCSC{Float64, Integer}, which is what sprand(501,501,0.005) generates.

@CRJ

IIRC, transpose(A) makes a view of A through LinearAlgebra, which requires translating coordinates for every access. I don't think the fast ways of doing MV math will work through that interface. I'm not surprised that converting your transpose to a matrix object instead of trying to do math through the view is faster.

transpose(A) yields Transpose(A), where Transpose is a lazy transpose wrapper. For sparse-matrix-dense-vector multiplication there are tailored methods, which do not require any mutations of A.