I have a matrix that has a quite sparse index (the largest values in both rows and columns are beyond 130000), but only a few of those rows/columns actually have non-zero values.
Thus, I want to have the row and column indices shifted to only represent the non-zero ones, by the first N natural numbers.
Visually, I want a example matrix like this
1 0 1
0 0 0
0 0 1
to look like this
1 1
0 1
but only if all values in the row/column are zero. Since I do have the matrix in a sparse format, I could simply create a dictionary, store every value by an increasing counter (for row and matrix separately), and get a result.
row_dict = {}
col_dict = {}
row_ind = 0
col_ind = 0
# el looks like this: (row, column, value)
for el in sparse_matrix:
if el[0] not in row_dict.keys():
row_dict[el[0]] = row_ind
row_ind += 1
if el[1] not in col_dict.keys():
col_dict[el[1]] = col_ind
col_ind += 1
# now recreate matrix with new index
But I was looking for maybe an internal function in NumPy. Also note that I do not really know how to word the question, so there might well be a duplicate out there that I do not know of; Any pointers in the right direction are appreciated.
You can use np.unique
:
>>> import numpy as np
>>> from scipy import sparse
>>>
>>> A = np.random.randint(-100, 10, (10, 10)).clip(0, None)
>>> A
array([[6, 0, 5, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 7, 0, 0, 0, 0, 4, 9],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 4, 0],
[9, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 4, 0, 0, 0, 0, 0, 0]])
>>> B = sparse.coo_matrix(A)
>>> B
<10x10 sparse matrix of type '<class 'numpy.int64'>'
with 8 stored elements in COOrdinate format>
>>> runq, ridx = np.unique(B.row, return_inverse=True)
>>> cunq, cidx = np.unique(B.col, return_inverse=True)
>>> C = sparse.coo_matrix((B.data, (ridx, cidx)))
>>> C.A
array([[6, 5, 0, 0, 0],
[0, 0, 7, 4, 9],
[0, 0, 0, 4, 0],
[9, 0, 0, 0, 0],
[0, 0, 4, 0, 0]])