Search code examples
numpyscipysparse-matrix

Scipy: Sparse Matrix giving incorrect values


Below is my code for generating my sparse matrix:

import numpy as np
import scipy

def sparsemaker(X, Y, Z):
    'X, Y, and Z are 2D arrays of the same size'
    x_, row = np.unique(X, return_inverse=True)
    y_, col = np.unique(Y, return_inverse=True)
    return scipy.sparse.csr_matrix( (Z.flat,(row,col)), shape=(x_.size, y_.size) )

>>> print sparsemaker(A, B, C) #A, B, and C are (220, 256) sized arrays.
(0, 0)  167064.269831
(0, 2)  56.6146564629
(0, 9)  53.8660340698
(0, 23) 80.6529717039
(0, 28) 0.0
(0, 33) 53.2379218326
(0, 40) 54.3868995375
 :          :

Now my input arrays are a bit large, so i don't know how to post them here (unless anyone has any ideas); but even looking at the first value, i can already tell something is wrong:

>>> test = sparsemaker(A, B, C)
>>> np.max(test.toarray())
167064.26983076424

>>> np.where(C==np.max(test.toarray()))
(array([], dtype=int64), array([], dtype=int64))

Does anyone know why this would happen? Where did that value come from?


Solution

  • You have repeated coordinates, and the constructor is adding them all up. Do the following :

    x_, row = np.unique(X, return_inverse=True)
    y_, col = np.unique(Y, return_inverse=True)
    print Z.flat[(row == 0) & (col == 0)].sum()
    

    and you should get that mysterious 167064.26983076424 printed out.

    EDIT The ugly code that follows works fine with small examples in averaging repeated entries, with some code borrowed from this other question, give it a try:

    def sparsemaker(X, Y, Z):
        'X, Y, and Z are 2D arrays of the same size'
        x_, row = np.unique(X, return_inverse=True)
        y_, col = np.unique(Y, return_inverse=True)
        indices = np.array(zip(row, col))
        _, repeats = np.unique(indices.view([('', indices.dtype)]*2),
                               return_inverse=True)
        counts = 1. / np.bincount(repeats)
        factor = counts[repeats]
    
        return scipy.sparse.csr_matrix((Z.flat * factor,(row,col)),
                                       shape=(x_.size, y_.size))