I am having problems with using a scipy COO sparse matrix as an input for Affinity propagation, but it works perfectly fine with a numpy array.
Just an example, say my similarity matrix is:
[[1.0, 0.9, 0.2]
[0.9, 1.0, 0.0]
[0.2, 0.0, 1.0]]
Numpy matrix version
import numpy as np
import sklearn.cluster
simnp = np.array([[1,0.9,0.2],[0.9,1,0],[0.2,0,1]])
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed")
affprop.fit(simnp)
works as expected.
Sparse Matrix version
import scipy.sparse as sps
import sklearn.cluster
simsps = sps.coo_matrix(([1,1,1,0.9,0.9,0.2,0.2],([0,1,2,0,1,0,2],[0,1,2,1,0,2,0])),(3,3))
affprop = sklearn.cluster.AffinityPropagation(affinity="precomputed")
affprop.fit(simsps)
returns the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python\Python27\lib\site-packages\sklearn\cluster\affinity_propagation_.py", line 301, in fit
copy=self.copy, verbose=self.verbose, return_n_iter=True)
File "C:\Python\Python27\lib\site-packages\sklearn\cluster\affinity_propagation_.py", line 90, in affinity_propagation
preference = np.median(S)
File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 3084, in median
overwrite_input=overwrite_input)
File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 2997, in _ureduce
r = func(a, **kwargs)
File "C:\Python\Python27\lib\site-packages\numpy\lib\function_base.py", line 3158, in _median
return mean(part[indexer], axis=axis, out=out)
File "C:\Python\Python27\lib\site-packages\numpy\core\fromnumeric.py", line 2878, in mean
out=out, keepdims=keepdims)
File "C:\Python\Python27\lib\site-packages\numpy\core\_methods.py", line 70, in _mean
ret = ret.dtype.type(ret / rcount)
ValueError: setting an array element with a sequence.
My laptop does not have enough RAM to take a dense matrix thus wanting to use a sparse matrix.
What am I doing wrong?
Thanks.
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.AffinityPropagation.html
fit(X, y=None) Parameters:
X: array-like, shape (n_samples, n_features) or (n_samples, n_samples)predict(X) Parameters:
X : {array-like, sparse matrix}, shape (n_samples, n_features)
http://scikit-learn.org/stable/modules/generated/sklearn.cluster.SpectralClustering.html
fit(X, y=None) Parameters:
X : array-like or sparse matrix, shape (n_samples, n_features)
So some of the methods do accept a sparse matrix. But the AffinityPropagation.fit
does not make that claim. Is that a documentation omission, or an indication that it does not work with a sparse matrix? Your error indicates the latter - for one reason or other, it has not been adapted to work with sparse.
I'm not a user of scikit-learn
, but have answered a few questions about sparse matrices in that package. My impression is the handling sparse is relatively new, and that in some cases they have to use todense()
to turn the sparse ones back into dense matrices.
Like I wrote in my comment, numpy
code, by itself, does not handle sparse matrices correctly. It only works if it delegates the action to sparse methods. It appears that np.median
and np.mean
do not properly delegate to sparse.coo_matrix.mean
.
Try:
np.median(simnp)
np.mean(simnp)
simnp.mean()