Search code examples
python-3.xscipysparse-matrixelementwise-operations

Element-wise division on sparse matrix python


I have a sparse matrix of size (n x m):

sparse_dtm = dok_matrix((num_documents, vocabulary_size), dtype=np.float32)
        for doc_index, document in enumerate(data):
            document_counter = Counter(document)
            for word in set(document):
                sparse_dtm[doc_index, word_to_index[word]] = document_counter[word]

Where:

  • num_documents = n
  • vocabulary_size = m
  • data = list of tokenized lists

Also, I have a list with length n:

sums = sparse_dtm.sum(1).tolist()

Now, I want to do an element-wise division in which each cell of row_i in sparse_dtm is divided by sums[i].

A naive approach, using the traditition Python element-wise division:

sparse_dtm / sums

Leads into the following error:

TypeError: unsupported operand type(s) for /: 'csr_matrix' and 'list'

How can I perform this element-wise division?


Solution

  • If I correctly understand, you need to divide each row by the sum of row, is that correct?

    In this case, you'd need to reshape the sum

    sparse_dtm / sparse_dtm.sum(1).reshape(-1, 1)
    

    you can also do it with a pandas DataFrame, for example

    row_num = 10
    col_num = 5
    sparse_dtm = np.ndarray((row_num, col_num), dtype=np.float32)
    for row in range(row_num):
        for col in range(col_num):
            value = (row+1) * (col+2)
            sparse_dtm[row, col] = value
    df = pd.DataFrame(sparse_dtm)
    print(df)
    

    gives

          0     1     2     3     4
    0   2.0   3.0   4.0   5.0   6.0
    1   4.0   6.0   8.0  10.0  12.0
    2   6.0   9.0  12.0  15.0  18.0
    3   8.0  12.0  16.0  20.0  24.0
    4  10.0  15.0  20.0  25.0  30.0
    5  12.0  18.0  24.0  30.0  36.0
    6  14.0  21.0  28.0  35.0  42.0
    7  16.0  24.0  32.0  40.0  48.0
    8  18.0  27.0  36.0  45.0  54.0
    9  20.0  30.0  40.0  50.0  60.0
    

    and then divide each row for the sum of row

    df / df.sum(axis=1).values.reshape(-1, 1)
    

    that gives

         0     1    2     3    4
    0  0.1  0.15  0.2  0.25  0.3
    1  0.1  0.15  0.2  0.25  0.3
    2  0.1  0.15  0.2  0.25  0.3
    3  0.1  0.15  0.2  0.25  0.3
    4  0.1  0.15  0.2  0.25  0.3
    5  0.1  0.15  0.2  0.25  0.3
    6  0.1  0.15  0.2  0.25  0.3
    7  0.1  0.15  0.2  0.25  0.3
    8  0.1  0.15  0.2  0.25  0.3
    9  0.1  0.15  0.2  0.25  0.3