Search code examples
matlabmatrixsparse-matrixoverhead

Is there overhead of accessing elements in a sparse matrix


Let's say I have a sparse matrix A. I want to do heavy calculations on it. The calculations do not modify A, only accessing its elements e.g. take a row of A then multiply with something. I wonder whether I should convert A to a full matrix before doing any calculation, or just do it directly?
In other words, is accessing elements in a sparse matrix slower than a full matrix?


Solution

  • Slicing sparse matrices across columns is much faster than slicing across rows in MATLAB. So you should prefer accessing M(:,i) over M(i,:).

    Internally MATLAB uses the Compressed Sparse Column (CSC) storage:

    • the non-zero elements are stored in a 1-D double array of length nzmax arranged in column-major order (pr for real part and pi for imaginary part if matrix is complex)
    • ir an array of integers with corresponding row indices
    • jc an integer array of length n+1 (where n is the number of columns) containing column index information. By definition, the last value of jc contains nnz (number of non-zeros stored).

    As an example, take the following sparse matrix:

    >> M = sparse([1 3 5 3 4 1 5], [1 1 1 2 2 4 4], [1 7 5 3 4 2 6], 5, 4);
    

    This is what it looks like stored in memory (I'm using 0-based indices for ir and jc):

    1 . . 2
    . . . .
    7 3 . .
    . 4 . .
    5 . . 6
    
    pr = 1 7 5 3 4 2 6
    ir = 0 2 4 2 3 0 4
    jc = 0 3 5 5 7
    
    nzmax = at least 7
    nnz = 7
    m = 5
    n = 4
    

    To retrieve the i-th column of the sparse matrix M(:,i), it suffices to do: pr(jc(i):jc(i+1)-1) (to keep it simple, I'm not paying attention to 0- vs 1-based indexing). On the other hand, accessing a matrix row involves more calculations and array traversals (it is no longer spatial-locality friendly).

    Here are some links to the MATLAB documentation for more info: Sparse Matrices, Manipulating Sparse Matrices


    It's worth checking out the original paper by John R. Gilbert, Cleve Moler, and Robert Schreiber: "Sparse Matrices In Matlab: Design and Implementation", (SIAM Journal on Matrix Analysis and Applications , 13:1, 333–356 (1992)).

    Here are some quotes from the above paper to answer your question about the overhead of sparse storage:

    The computational complexity of simple array operations should be proportional to nnz, and perhaps also depend linearly on m or n, but be independent of the product m*n. The complexity of more complicated operations involves such factors as ordering and fill-in, but an objective of a good sparse matrix algorithm should be:

    The time required for a sparse matrix operation should be proportional to number of arithmetic operations on nonzero quantities.

    We call this the "time is proportional to flops" rule; it is a fundamental tenet of our design.

    and

    This (column-oriented sparse matrix) scheme is not efficient for manipulating matrices on element at a time: access to a single element takes time at least proportional to the logarithm of the length of its column; inserting or removing a nonzero may require extensive data movement. However, element-by-element manipulation is rare in MATLAB (and is expensive even in full MATLAB). Its most common application would be to create a sparse matrix, but this is more efficiently done by building a list [i,j,s] of matrix elements in arbitrary order and then using sparse(i,j,s) to create the matrix.

    The sparse data structure is allowed to have unused elements after the end of the last column of the matrix. Thus an algorithm that builds up a matrix one column at a time can be implemented efficiently by allocating enough space for all the expected nonzeros at the outset.

    Also section 3.1.4 "asymptotic complexity analysis" should be of interest (it's too long to quote here).