Search code examples
matlabmatrixsparse-matrixadjacency-matrix

Building a sparse matrix from another smaller matrix's values mapped by a logical array mask (MATLAB)?


I am working with large sparse matrices (sparse). I have a large sparse matrix of values which needs to be included into a larger sparse matrix. I have an array of logicals which indicates which rows and columns are to be filled up with the smaller matrix values. In this application the smaller of the two matrices is a graph stored as an adjacency matrix and the logicals indicate the node id positions in the larger matrix.

A small toy example to demonstrate what I am doing currently:

zz = sparse(4,4);  %create a sparse matrix of the final size desired
rr = rand(3,3);   %a matrix of values
logical_inds = logical([1 0 1 1]); %a logical array to index with the dimension size of zz
zz(logical_inds,logical_inds) = rr(:,:) %'rr' values are mapped into the subset of 'zz'

I see that the 2nd column of zz are zeros, and that the 2nd row are zero values as well. This is the output desired.

In my program get a warning that this "sparse indexing is likely to be slow", and it is. Occasionally when the matrices are very large the program terminates at this line.

How can I create this matrix (zz) with the sparse method? I am unsure how to create the row column indexes from the mask of logicals I have, and how to turn the values of rr into an array ordered appropriately for this new indexing.

**in general rr is very sparse although the mask of logicals addresses the full matrix


Solution

  • I think the problem is mostly due to implicit resizing during the allocation. Here's why I think that:

    %# test parameters
    N  = 5000;               %# Size of 1 dimension of the square sparse
    L  = rand(1,N) > 0.95;   %# 5% of rows/cols will be non-zero values
    M  = sum(L);            
    rr = rand(M);            %# the "data" to fill the sparse with 
    
    
    %# Method 1: direct logical indexing
    %# (your original method)
    zz1 = sparse(N,N);    
    tic    
        zz1(L,L) = rr;
    toc
    
    %# Method 2: test whether the conversion to logical col/row indices matters 
    zz2  = sparse(N,N);    
    inds = zz1~=0;    
    tic        
        zz2(inds) = rr;
    toc
    
    %# Method 3: test whether the conversion to linear indices matters 
    zz3 = sparse(N,N);
    inds = find(inds);
    tic        
        zz3(inds) = rr;
    toc
    
    %# Method 4: test whether implicit resizing matters    
    zz4 = spalloc(N,N, M*M);
    tic        
        zz4(inds) = rr;
    toc
    

    Results:

    Elapsed time is 3.988558 seconds. %# meh   M1 (original)
    Elapsed time is 3.916462 seconds. %# meh   M2 (expanded logicals)
    Elapsed time is 4.003222 seconds. %# meh   M3 (converted row/col indices)
    Elapsed time is 0.139986 seconds. %# WOW!  M4 (pre-allocated memory)
    

    So apparently (and amazingly), it would seem that MATLAB does not grow the existing sparse before the allocation (as you'd expect), but actually loops through the row/col indices and grows the sparse during the iteration. Therefore, it would seem one has to "help" MATLAB a bit:

    %# Create initial sparse
    zz1 = sparse(N,N);    
    
    %# ...
    %# Do any further operations until you can create rr: 
    %# ...
    
    rr = rand(M);            %# the "data" to fill the sparse with 
    
    
    %# Now that the size of the data is known, re-allocate space for the sparse:
    tic
        [i,j] = find(zz1); %# indices
        [m,n] = size(zz1); %# Sparse size (you can also use N of course)
        zz1 = sparse(i,j,nonzeros(zz1), m,n, M*M);
        zz1(L,L) = rr; %# logical or integer indices, doesn't really matter 
    toc
    

    Results (for the same N, L and rr):

    Elapsed time is 0.034950 seconds.  %# order of magnitude faster than even M4!
    

    See also this question+answers.