Search code examples
pythonscipyscikit-learnsparse-matrix

Using a Sparse Matrix with sklearn Affinity Propagation


I am having problems with using a scipy COO sparse matrix as an input for Affinity propagation, but it works perfectly fine with a numpy array.

Just an example, say my similarity matrix is:

[[1.0, 0.9, 0.2]
 [0.9, 1.0, 0.0]
 [0.2, 0.0, 1.0]]

Numpy matrix version

import numpy as np
import sklearn.cluster

simnp = np.array([[1,0.9,0.2],[0.9,1,0],[0.2,0,1]])
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed")
affprop.fit(simnp)

works as expected.

Sparse Matrix version

import scipy.sparse as sps
import sklearn.cluster

simsps = sps.coo_matrix(([1,1,1,0.9,0.9,0.2,0.2],([0,1,2,0,1,0,2],[0,1,2,1,0,2,0])),(3,3))
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed")
affprop.fit(simsps)

returns the following error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python\Python27\lib\site-packages\sklearn\cluster\affinity_propagation_.py", line 301, in fit
    copy=self.copy, verbose=self.verbose, return_n_iter=True)
  File "C:\Python\Python27\lib\site-packages\sklearn\cluster\affinity_propagation_.py", line 90, in affinity_propagation
    preference = np.median(S)
  File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 3084, in median
    overwrite_input=overwrite_input)
  File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 2997, in _ureduce
    r = func(a, **kwargs)
  File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 3158, in _median
    return mean(part[indexer], axis=axis, out=out)
  File "C:\Python\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2878, in mean
    out=out, keepdims=keepdims)
  File "C:\Python\Python27\lib\site-packages\numpy\core\_methods.py", line 70, in _mean
    ret = ret.dtype.type(ret / rcount)
ValueError: setting an array element with a sequence.

My laptop does not have enough RAM to take a dense matrix thus wanting to use a sparse matrix.

What am I doing wrong?

Thanks.


Solution

  • http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html

    fit(X, y=None) Parameters:
    X: array-like, shape (n_samples, n_features) or (n_samples, n_samples)

    predict(X) Parameters:
    X : {array-like, sparse matrix}, shape (n_samples, n_features)

    http://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html

    fit(X, y=None) Parameters:
    X : array-like or sparse matrix, shape (n_samples, n_features)

    So some of the methods do accept a sparse matrix. But the AffinityPropagation.fit does not make that claim. Is that a documentation omission, or an indication that it does not work with a sparse matrix? Your error indicates the latter - for one reason or other, it has not been adapted to work with sparse.

    I'm not a user of scikit-learn, but have answered a few questions about sparse matrices in that package. My impression is the handling sparse is relatively new, and that in some cases they have to use todense() to turn the sparse ones back into dense matrices.

    Like I wrote in my comment, numpy code, by itself, does not handle sparse matrices correctly. It only works if it delegates the action to sparse methods. It appears that np.median and np.mean do not properly delegate to sparse.coo_matrix.mean.

    Try:

    np.median(simnp)
    np.mean(simnp)
    simnp.mean()