I am using DBSCAN.fit() on a dataset that is actually a pandas single column with vectorized words, all the same # of dimensions, 30. It looks like this:
df['column']
2 [-0.003417029886667123, -0.0016105849274073794...
3 [-0.24330333298729837, 0.48110865717035506, 0....
4 [-0.0017016271879120766, 0.01266130386650884, ...
5 [0.002174357210089775, 0.004633570752676618, 0...
6 [0.008567001972125537, 0.0012244984475515731, ...
matrix = df['column'].as_matrix()
#DBSCAN inplementation
db = DBSCAN(eps=0.06, min_samples=1)
db.fit(matrix)
clusters = db.labels_.tolist()
However, upon fitting the data, I am getting the following traceback:
----> 4 db.fit(matrix)
5 clusters = db.labels_.tolist()
/opt/conda/lib/python3.6/site-packages/sklearn/cluster/dbscan_.py in fit(self, X, y, sample_weight)
280
281 """
--> 282 X = check_array(X, accept_sparse='csr')
283 clust = dbscan(X, sample_weight=sample_weight,
284 **self.get_params())
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
I've read that this error has to do with one or more arrays not being the same len as the rest. However, in my case, this seems no to be the problem, please bee below:
set(np.array([m]).shape[0] for m in matrix)
>> {1}
set(np.array([m]).shape[1] for m in matrix)
>> {30}
As you can see, all arrays are of the same len. Therefore what could be the problem?
The way you are converting your feature to array, does not convert it to an array, but to an array of lists, that is why you are seeing this error.
What you can do is convert the inner lists into arrays as well