I am attempting to cluster a set of data points that are represented as a sparse scipy matrix, X. That is,
>>> print type(X)
<class 'scipy.sparse.csr.csr_matrix'>
>>> print X.shape
(57, 1038)
>>> print X[0]
(0, 223) 0.471313296962
(0, 420) 0.621222153695
(0, 1030) 0.442688836467
(0, 124) 0.442688836467
When I feed this matrix into an sklearn.mixture.GMM model, however, it raises the following ValueError:
File "/Library/Python/2.7/site-packages/sklearn/mixture/gmm.py", line 423, in fit
X = np.asarray(X, dtype=np.float)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/core/numeric.py", line 235, in asarray
return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.
However, I have been able to make the sklearn.cluster.KMeans model work perfectly on the same sparse matrix X.
Some other hopefully useful info: scipy version = 0.11.0, sklearn version = 0.14.1
Any ideas on what is going wrong? Thanks in advance!
GMMs don't support sparse matrix input, while KMeans
does. If an estimator supports sparse matrices, this is always explicitly stated in the docstring for the relevant method.