I'm having speed issues multiplying the transpose of a sparse matrix with a column vector.
In my code the matrix A is
501×501 SparseMatrixCSC{Float64, Integer} with 1501 stored entries:
⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⠀⠀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠻⣦⡀⢸
⠀⠀⠀⠀⠀⠀⠀⠀⠀⡀⠀⠀⠀⠀⠀⠈⠻⣾
These are the results I get from the multiplication with f0 = rand(Float64,501,1)
:
Method 1
A_tr = transpose(A)
@benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 350.083 μs … 9.066 ms ┊ GC (min … max): 0.00% … 95.44%
Time (median): 361.208 μs ┊ GC (median): 0.00%
Time (mean ± σ): 380.269 μs ± 355.997 μs ┊ GC (mean ± σ): 4.06% ± 4.15%
Memory estimate: 218.70 KiB, allocs estimate: 11736.
Method 2
A_tr = Matrix(transpose(A))
@benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
Range (min … max): 87.375 μs … 210.875 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 88.542 μs ┊ GC (median): 0.00%
Time (mean ± σ): 89.286 μs ± 3.266 μs ┊ GC (mean ± σ): 0.00% ± 0.00%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Method 3
A_tr = sparse(Matrix(transpose(A)))
@benchmark A_tr*f
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
Range (min … max): 2.102 μs … 1.017 ms ┊ GC (min … max): 0.00% … 99.40%
Time (median): 2.477 μs ┊ GC (median): 0.00%
Time (mean ± σ): 2.725 μs ± 13.428 μs ┊ GC (mean ± σ): 6.92% ± 1.41%
Memory estimate: 4.06 KiB, allocs estimate: 1.
Why doesn't Method 1 produce a similar performance as Method 3? I'm probably missing something basic here.
Thank you for your help!
using the following MWE
using LinearAlgebra, BenchmarkTools, SparseArrays
A = sprand(501,501,0.005)
At1 = transpose(A)
At2 = sparse(Matrix(transpose(A)))
f = rand(Float64,501,1)
you will find no significant performance difference between
@benchmark $At1*$f
and
@benchmark $At2*$f
As was pointed out by @SGJ the trick is to have a primitive type as parameter for your container, i.e. SparseMatrixCSC{Float64, Int64}
instead of SparseMatrixCSC{Float64, Integer}
, which is what sprand(501,501,0.005)
generates.
@CRJ
IIRC, transpose(A) makes a view of A through LinearAlgebra, which requires translating coordinates for every access. I don't think the fast ways of doing MV math will work through that interface. I'm not surprised that converting your transpose to a matrix object instead of trying to do math through the view is faster.
transpose(A)
yields Transpose(A)
, where Transpose
is a lazy transpose wrapper. For sparse-matrix-dense-vector multiplication there are tailored methods, which do not require any mutations of A
.