Search code examples
pythonnumpysparse-matrix

How to avoid loops for sparse Matrix?


I am having problem with 3 loops in Python. The purpose of this code is to calculate sparse matrix according to number of (x) unknown values of DATA. Here, x number is 13 , which means unrepeated values of DATA :(0, 4, 8, 12, 16, 20, 21, 22, 23, 24, 25, 26, 27). Then, len(DATA) is 4 that indicates rows number of A_sparse matrix. Then, I create sparse zero matrix with shape(4,13). Then, I take portion value to A_sparse if x is equal to unknown value.

Question

  • This code work properly, but with loops!!! I should remove loops, but how?

Here, I put example below:

Inputs:

  • DATA - indicates index; [[24, 20, 21, 22, 23], [24, 25, 26, 27], [25, 26, 27, 23], [0, 4, 8, 12, 16, 20]]
  • PORTION - [[ 1.16950604, 0.08724138, 1.5326188 , 1.5326188 , 0.74587448], [ 0.44409055, 1.51394507, 1.51394507, 0.95883188], [ 0.77097384, 1.77917041, 0.14615981, 0.185952 ], [ 0.93, 1.5 , 1.5 , 1.5 , 1.5 , 0.07]]

Output: - A_sparse - Sparse matrix;

def get_sparse(DATA, PORTION):

    x = np.unique( flatten(DATA) )
    A = np.zeros((len(DATA), len(x)))

    for i in range(len(DATA)):
        for m1,m2 in enumerate(DATA[i]):
            for j,k in enumerate(x):
                    if float(m2) == float(k):
                            A[i][j] = PORTION[i][m1]
    return A


>>> get_sparse(DATA, PORTION)
array([[ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.08724138,  1.5326188 ,  1.5326188 ,  0.74587448,  1.16950604,
     0.        ,  0.        ,  0.        ],
   [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  0.        ,  0.44409055,
     1.51394507,  1.51394507,  0.95883188],
   [ 0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ,  0.185952  ,  0.        ,
     0.77097384,  1.77917041,  0.14615981],
   [ 0.93      ,  1.5       ,  1.5       ,  1.5       ,  1.5       ,
     0.07      ,  0.        ,  0.        ,  0.        ,  0.        ,
     0.        ,  0.        ,  0.        ]])

I do often not prefer using loops when I use Python,so, I wanted to remove loops to make this code shorter and faster. Any answer would be appreciated!


Solution

  • Given the irregular lists of lists nature of DATA and PORTION, a numpy vectorization of this problem is not obvious. However, it helps me to visualize the constructor if I print get_sparse(DATA, DATA):

    [[ 0  0  0  0  0 20 21 22 23 24  0  0  0]
     [ 0  0  0  0  0  0  0  0  0 24 25 26 27]
     [ 0  0  0  0  0  0  0  0 23  0 25 26 27]
     [ 0  4  8 12 16 20  0  0  0  0  0  0  0]]
    

    Those look like regular column numbers, except the first ones step by 4. This version of your function builds a (4,28) array, and then trims out the zero columns.

    def get_sparse(DATA, PORTION):
        x = np.unique(itertools.chain(*DATA) )
        A = np.zeros((len(DATA), max(x)+1))
        for a, d, p in zip(A, DATA, PORTION):
            a[d] = p
        return A[:,x]
    

    To use sparse matrices, try:

    from scipy import sparse
    A = sparse.lil_matrix((4,28))
    A.data = PORTION
    A.rows = DATA
    

    Your DATA and PORTION lists of lists exactly match the internal format of the lil_matrix type. Once constructed A can be transformed into one of the other sparse types for math or slicing. A.toarray() gives the full 'dense' form.