I have a sparse matrix of size (n x m):
sparse_dtm = dok_matrix((num_documents, vocabulary_size), dtype=np.float32)
for doc_index, document in enumerate(data):
document_counter = Counter(document)
for word in set(document):
sparse_dtm[doc_index, word_to_index[word]] = document_counter[word]
Where:
Also, I have a list with length n
:
sums = sparse_dtm.sum(1).tolist()
Now, I want to do an element-wise division in which each cell of row_i
in sparse_dtm
is divided by sums[i]
.
A naive approach, using the traditition Python element-wise division:
sparse_dtm / sums
Leads into the following error:
TypeError: unsupported operand type(s) for /: 'csr_matrix' and 'list'
How can I perform this element-wise division?
If I correctly understand, you need to divide each row by the sum of row, is that correct?
In this case, you'd need to reshape the sum
sparse_dtm / sparse_dtm.sum(1).reshape(-1, 1)
you can also do it with a pandas DataFrame, for example
row_num = 10
col_num = 5
sparse_dtm = np.ndarray((row_num, col_num), dtype=np.float32)
for row in range(row_num):
for col in range(col_num):
value = (row+1) * (col+2)
sparse_dtm[row, col] = value
df = pd.DataFrame(sparse_dtm)
print(df)
gives
0 1 2 3 4
0 2.0 3.0 4.0 5.0 6.0
1 4.0 6.0 8.0 10.0 12.0
2 6.0 9.0 12.0 15.0 18.0
3 8.0 12.0 16.0 20.0 24.0
4 10.0 15.0 20.0 25.0 30.0
5 12.0 18.0 24.0 30.0 36.0
6 14.0 21.0 28.0 35.0 42.0
7 16.0 24.0 32.0 40.0 48.0
8 18.0 27.0 36.0 45.0 54.0
9 20.0 30.0 40.0 50.0 60.0
and then divide each row for the sum of row
df / df.sum(axis=1).values.reshape(-1, 1)
that gives
0 1 2 3 4
0 0.1 0.15 0.2 0.25 0.3
1 0.1 0.15 0.2 0.25 0.3
2 0.1 0.15 0.2 0.25 0.3
3 0.1 0.15 0.2 0.25 0.3
4 0.1 0.15 0.2 0.25 0.3
5 0.1 0.15 0.2 0.25 0.3
6 0.1 0.15 0.2 0.25 0.3
7 0.1 0.15 0.2 0.25 0.3
8 0.1 0.15 0.2 0.25 0.3
9 0.1 0.15 0.2 0.25 0.3