Search code examples
cluster-computingdbscan

How to use DBSCAN algorithm for a list of points in python


I am new to image processing and python coding. I have detected a number of features in an image and have their respective pixel locations placed in a list format.

My_list = [(x1,y1),(x2,y2),......,(xn,yn)]

I would like to use DBSCAN algorithm to form clusters from the following points. Currently using sklearn.cluster to import the build in DBSCAN function for python. If the current format for the points is not compatible would like to know which is?

Error currently facing with the current format:

C:\Python\python.exe "F:/opencv_files/dbscan.py"  
**Traceback (most recent call last):**
  **File "**F:/opencv_files/dbscan.py**", line 83, in <module>  
    db = DBSCAN(eps=0.5, min_samples=5).fit(X) # metric=X)**  
  **File "**C:\Python\lib\site-packages\sklearn\cluster\dbscan_.py**", line 282, in fit  
    X = check_array(X, accept_sparse='csr')  
  File "**C:\Python\lib\site-packages\sklearn\utils\validation.py**", line 441, in check_array  
    "if it contains a single sample.".format(array))  
ValueError: Expected 2D array, got 1D array instead:  
array=[].  
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.**

Solution

  • Your data is a list of tuple. There is nothing in this structure that prevents you from doing crazy things with that, such as having different lengths in there. Plus, this is a very slow and memory inefficient way of keeping the data because everything is boxed as a Python object.

    Just call data = numpy.array(data) to convert your data into an efficient multidimensional numeric array. This array will then have a shape.