Search code examples
pythonmatrixscipysparse-matrix

How to transform an integer value sparse matrix to 0/1 value sparse matrix, Python


I have a sparse matrix from the sklearn bag-of-words vectorizer. It's a csr_matrix and its elements represent word frequency in a document. But now what I need is the 0/1 matrix where 1 represents the word exists in the document, so I don't care about the actual frequency. Disregard the background problem, it's like this: I have a sparse matrix,

2 3 4 0 0 0
0 0 0 0 0 8
0 0 0 2 0 0
0 0 0 0 0 0

I want all the nonzero elements to be 1,

1 1 1 0 0 0
0 0 0 0 0 1
0 0 0 1 0 0
0 0 0 0 0 0

How can I achieve this? I assume using todense() and then loop is not a good choice since the sparse matrix is large. Is there a better way?


Solution

  • Try csr_matrix.sign. it should be exactly what you need (although I didn't try it myself).