python-3.x dictionary scipy sparse-matrix

Access element from csr_matrix

I have created a Sparse matrix using the Scipy dok_matrix method as follows:

sparse_dtm = dok_matrix((num_documents, vocabulary_size), dtype=np.float32)
for doc_index, document in enumerate(data_list):
    document_counter = Counter(document)
    for word in set(document):
        sparse_dtm[doc_index, word_index[word]] = document_counter[word]

Where data_list is a list of lists with tokenized texts.

After having created sparse_dtm, I would like to retrieve all values for the first row.

From the documentation I know that I can use the .getrow() method to get all elements from row i.

However, so far I am unable to retrieve the keys/values stored in the csr_matrix:

sparse_dtm.getrow(0).keys()
AttributeError: keys not found

sparse_dtm.getrow(0)[0]
<1x90140 sparse matrix of type '<class 'numpy.float32'>'
    with 576 stored elements in Compressed Sparse Row format>

sparse_dtm does contain the right information though:

print(sparse_dtm.getrow(0))
Output: (0, 21018)    6.0
        (0, 76741)    3.0
        (0, 14008)    1.0
        (0, 54143)    2.0
        (0, 11866)    1.0
        ...

How can I access elements from row i and retrieve its keys and values?

Solution

To obtain the values:

sparse_p_ij = dok_matrix((num_documents, vocabulary_size), dtype=np.float32)
row_zero = self.sparse_dtm.getrow(0).toarray()[0]

This provides all the values. To obtain the keys for each value, take the index of a non-zero value:

indices = row_zero.nonzero()[0]

Then feed these values to index_to_word, which I have created as follows:

word_to_index = dict()
index_to_word = dict()

for i, word in enumerate(vocabulary):
    word_to_index[word] = i
    index_to_word[i] = word

Where vocabulary is a set of all the words in the corpus.