Search code examples
pythonpandasvalueerrordbscan

ValueError: setting an array element with a sequence. on DBSCAN, no missing dimensionality


I am using DBSCAN.fit() on a dataset that is actually a pandas single column with vectorized words, all the same # of dimensions, 30. It looks like this:

df['column']
2       [-0.003417029886667123, -0.0016105849274073794...
3       [-0.24330333298729837, 0.48110865717035506, 0....
4       [-0.0017016271879120766, 0.01266130386650884, ...
5       [0.002174357210089775, 0.004633570752676618, 0...
6       [0.008567001972125537, 0.0012244984475515731, ...

matrix = df['column'].as_matrix() 
#DBSCAN inplementation
db = DBSCAN(eps=0.06, min_samples=1)
db.fit(matrix)
clusters = db.labels_.tolist()

However, upon fitting the data, I am getting the following traceback:

----> 4 db.fit(matrix)
      5 clusters = db.labels_.tolist()

/opt/conda/lib/python3.6/site-packages/sklearn/cluster/dbscan_.py in fit(self, X, y, sample_weight)
    280 
    281         """
--> 282         X = check_array(X, accept_sparse='csr')
    283         clust = dbscan(X, sample_weight=sample_weight,
    284                        **self.get_params())

/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
    431                                       force_all_finite)
    432     else:
--> 433         array = np.array(array, dtype=dtype, order=order, copy=copy)
    434 
    435         if ensure_2d:

ValueError: setting an array element with a sequence.

I've read that this error has to do with one or more arrays not being the same len as the rest. However, in my case, this seems no to be the problem, please bee below:

set(np.array([m]).shape[0] for m in matrix)
>> {1}

set(np.array([m]).shape[1] for m in matrix)
>> {30}

As you can see, all arrays are of the same len. Therefore what could be the problem?


Solution

  • The way you are converting your feature to array, does not convert it to an array, but to an array of lists, that is why you are seeing this error.depicted in below image

    What you can do is convert the inner lists into arrays as well just like this