Search code examples
pythonnumpysparse-matrix

How can I create a sparse matrix instead of a dense one in this program?


I have this delta function which have 3 cases. mask1, mask2 and if none of them is satisfied delta = 0, since res = np.zeros

def delta(r, dr):
    res = np.zeros(r.shape)
    mask1 = (r >= 0.5*dr) & (r <= 1.5*dr)
    res[mask1] = (5-3*np.abs(r[mask1])/dr \
        - np.sqrt(-3*(1-np.abs(r[mask1])/dr)**2+1)) \
        /(6*dr)
    mask2 = np.logical_not(mask1) & (r <= 0.5*dr)
    res[mask2] = (1+np.sqrt(-3*(r[mask2]/dr)**2+1))/(3*dr)
    return res

Then I have this other function where I call the former and I construct an array, E

def matrix_E(nk,X,Y,xhi,eta,dx,dy):
   rx =  abs(X[np.newaxis,:] - xhi[:,np.newaxis])
   ry =  abs(Y[np.newaxis,:] - eta[:,np.newaxis])
   deltx = delta(rx,dx)
   delty = delta(ry,dy)
   E = deltx*delty
   return E

The thing is that most of the elements of E belong to the third case of delta, 0. Most means about 99%. So, I would like to have a sparse matrix instead of a dense one and not to stock the 0 elements in order to save memory.

Any ideas in how I could do it?


Solution

  • The normal way to create a sparse matrix is to construct three 1d arrays, with the nonzero values, and their i and j indexes. Then pass them to the coo_matrix function.

    The coordinates don't have to be in order, so you could construct the arrays for the 2 nonzero mask cases and concatenate them.

    Here's a sample construction using 2 masks

    In [107]: x=np.arange(5)
    
    In [108]: i,j,data=[],[],[]
    
    In [110]: mask1=x%2==0
    In [111]: mask2=x%2!=0
    
    In [112]: i.append(x[mask1])
    In [113]: j.append((x*2)[mask1])
    
    In [114]: i.append(x[mask2])
    In [115]: j.append(x[mask2])
    
    In [116]: i=np.concatenate(i)
    In [117]: j=np.concatenate(j)
    
    In [118]: i
    Out[118]: array([0, 2, 4, 1, 3])
    
    In [119]: j
    Out[119]: array([0, 4, 8, 1, 3])
    
    In [120]: M=sparse.coo_matrix((x,(i,j)))
    
    In [121]: print(M)
      (0, 0)    0
      (2, 4)    1
      (4, 8)    2
      (1, 1)    3
      (3, 3)    4
    
    In [122]: M.A
    Out[122]: 
    array([[0, 0, 0, 0, 0, 0, 0, 0, 0],
           [0, 3, 0, 0, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 1, 0, 0, 0, 0],
           [0, 0, 0, 4, 0, 0, 0, 0, 0],
           [0, 0, 0, 0, 0, 0, 0, 0, 2]])
    

    A coo format stores those 3 arrays as is, but they get sorted and cleaned up when converted to other formats and printed.

    I can work on adapting this to your case, but this may be enough to get you started.


    It looks like X,Y,xhi,eta are 1d arrays. rx and ry are then 2d. delta returns a result the same shape as its input. E = deltx*delty suggests that deltax and deltay are the same shape (or at least broadcastable).

    Since sparse matrix has a .multiply method to do element wise multiplication, we can focus on producing sparse delta matrices.

    If you afford the memory to make rx, and a couple of masks, then you can also afford to make deltax (all the same size). Even through deltax has lots of zeros, it is probably fastest to make it dense.

    But let's try to case the delta calculation, as a sparse build.

    This looks like the essense of what you are doing in delta, at least with one mask:

    start with a 2d array:

    In [138]: r = np.arange(24).reshape(4,6)    
    In [139]: mask1 = (r>=8) & (r<=16)
    In [140]: res1 = r[mask1]*0.2
    In [141]: I,J = np.where(mask1)
    

    the resulting vectors are:

    In [142]: I
    Out[142]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)    
    In [143]: J
    Out[143]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
    In [144]: res1
    Out[144]: array([ 1.6,  1.8,  2. ,  2.2,  2.4,  2.6,  2.8,  3. ,  3.2])
    

    Make a sparse matrix:

    In [145]: M=sparse.coo_matrix((res1,(I,J)), r.shape)    
    In [146]: M.A
    Out[146]: 
    array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  1.6,  1.8,  2. ,  2.2],
           [ 2.4,  2.6,  2.8,  3. ,  3.2,  0. ],
           [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ]])
    

    I could make another sparse matrix with mask2, and add the two.

    In [147]: mask2 = (r>=17) & (r<=22)    
    In [148]: res2 = r[mask2]*-0.4
    In [149]: I,J = np.where(mask2)
    In [150]: M2=sparse.coo_matrix((res2,(I,J)), r.shape)
    In [151]: M2.A
    Out[151]: 
    array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  0. ,  0. ,  0. , -6.8],
           [-7.2, -7.6, -8. , -8.4, -8.8,  0. ]])
    
    ...
    In [153]: (M1+M2).A
    Out[153]: 
    array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  1.6,  1.8,  2. ,  2.2],
           [ 2.4,  2.6,  2.8,  3. ,  3.2, -6.8],
           [-7.2, -7.6, -8. , -8.4, -8.8,  0. ]])
    

    Or I could concatenate the res1 and res2, etc and make one sparse matrix:

    In [156]: I1,J1 = np.where(mask1)
    In [157]: I2,J2 = np.where(mask2)
    In [158]: res12=np.concatenate((res1,res2))
    In [159]: I12=np.concatenate((I1,I2))
    In [160]: J12=np.concatenate((J1,J2))
    In [161]: M12=sparse.coo_matrix((res12,(I12,J12)), r.shape)
    
    In [162]: M12.A
    Out[162]: 
    array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  1.6,  1.8,  2. ,  2.2],
           [ 2.4,  2.6,  2.8,  3. ,  3.2, -6.8],
           [-7.2, -7.6, -8. , -8.4, -8.8,  0. ]])
    

    Here I choose the masks so the nonzero values don't overlap, but both methods work if they did. It's a delibrate design feature of the coo format that values for repeated indices are summed. It's very handy feature when creating sparse matries for finite element problems.


    I can also get index arrays by creating a sparse matrix from the mask:

    In [179]: rmask1=sparse.coo_matrix(mask1)
    
    In [180]: rmask1.row
    Out[180]: array([1, 1, 1, 1, 2, 2, 2, 2, 2], dtype=int32)
    
    In [181]: rmask1.col
    Out[181]: array([2, 3, 4, 5, 0, 1, 2, 3, 4], dtype=int32)
    In [184]: sparse.coo_matrix((res1, (rmask1.row, rmask1.col)),rmask1.shape).A
    Out[184]: 
    array([[ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ],
           [ 0. ,  0. ,  1.6,  1.8,  2. ,  2.2],
           [ 2.4,  2.6,  2.8,  3. ,  3.2,  0. ],
           [ 0. ,  0. ,  0. ,  0. ,  0. ,  0. ]])
    

    I can't, though, create a mask from a sparse version of r. (r>=8) & (r<=16). That kind of inequality test has not been implemented for sparse matrices. But that might not matter, since r is probably not sparse.