performance julia sparse-matrix benchmarking

Julia: view of sparse matrix

I am rather confused by how view acts on sparse matrices in Julia:

using LinearAlgebra, SparseArrays, BenchmarkTools
v = SparseVector(1000, [212,554,873], [.3, .4, .3]);
A = sparse(diagm(rand(1000)));  # same effect observed for non-diag matrix
B = view(A, :, :);

julia> @btime A*v;
  1.536 μs (4 allocations: 23.84 KiB)

julia> @btime B*v;
  11.557 ms (5 allocations: 288 bytes)

B*v seems to have much lower memory footprint but is 8000x slower than A*v. What is going on and what causes these performance differences?

Solution

Update June 2021: The missing specialized algorithm for sparse views mentioned below is now implemented, so the performance is much more reasonable these days (Julia 1.6+):

julia> @btime A*v;
  2.063 μs (4 allocations: 23.84 KiB)

julia> @btime B*v;
  2.836 μs (9 allocations: 25.30 KiB)

And you can see that, yes, indeed, this is using a sparse specialization for *:

julia> @which B*v
*(A::Union{SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, SubArray{Tv, 2, var"#s832", Tuple{Base.Slice{Base.OneTo{Int64}}, I}, L} where {I<:AbstractUnitRange, var"#s832"<:SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, L}, LowerTriangular{Tv, var"#s831"} where var"#s831"<:Union{SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, SubArray{Tv, 2, var"#s832", Tuple{Base.Slice{Base.OneTo{Int64}}, I}, L} where {I<:AbstractUnitRange, var"#s832"<:SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, L}}, UpperTriangular{Tv, var"#s832"} where var"#s832"<:Union{SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, SubArray{Tv, 2, var"#s832", Tuple{Base.Slice{Base.OneTo{Int64}}, I}, L} where {I<:AbstractUnitRange, var"#s832"<:SparseArrays.AbstractSparseMatrixCSC{Tv, Ti}, L}}} where {Tv, Ti}, B::AbstractSparseVector{Tv, Ti} where {Tv, Ti}) in SparseArrays at /Users/mbauman/Julia/release-1.6/usr/share/julia/stdlib/v1.6/SparseArrays/src/linalg.jl:163

Previously, that method wasn't implemented, meaning that the below occurred:

Old answer: It's not 8x slower, it's 8000x slower. The reason is because Julia uses multiple dispatch to use specialized algorithms that can exploit the sparse storage of the matrix and vector to completely avoid working on sections of the array it knows will just be zero. You can see which algorithm is getting called with @which:

julia> @which A*v
*(A::SparseArrays.AbstractSparseMatrixCSC, x::AbstractSparseArray{Tv,Ti,1} where Ti where Tv) in SparseArrays at /Users/mbauman/Julia/master/usr/share/julia/stdlib/v1.4/SparseArrays/src/sparsevector.jl:1722

julia> @which B*v
*(A::AbstractArray{T,2}, x::AbstractArray{S,1}) where {T, S} in LinearAlgebra at /Users/mbauman/Julia/master/usr/share/julia/stdlib/v1.4/LinearAlgebra/src/matmul.jl:50

The former is using a highly specialized sparse implementation, whereas the latter is using a slightly more general interface that can also support views. Now, ideally we'd detect the trivial cases like view(A, :, :) and specialize those to effectively be the same, too, but note that in general views may not preserve the sparsity and structure of the matrix:

julia> view(A, ones(Int, 1000), ones(Int, 1000))
1000×1000 view(::SparseMatrixCSC{Float64,Int64}, [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1  …  1, 1, 1, 1, 1, 1, 1, 1, 1, 1]) with eltype Float64:
 0.306159  0.306159  0.306159  0.306159  …  0.306159  0.306159  0.306159
 0.306159  0.306159  0.306159  0.306159     0.306159  0.306159  0.306159
 0.306159  0.306159  0.306159  0.306159     0.306159  0.306159  0.306159
 ⋮                                       ⋱
 0.306159  0.306159  0.306159  0.306159     0.306159  0.306159  0.306159
 0.306159  0.306159  0.306159  0.306159     0.306159  0.306159  0.306159
 0.306159  0.306159  0.306159  0.306159  …  0.306159  0.306159  0.306159