Search code examples
pythontestingmachine-learningunsupervised-learningdbscan

Is Testing the DBSCAN clustering algorithm possible? And if yes, how?


I want to use the DBSCAN clustering algorithm in order to detect outliers in my dataset. As this is an unsupervised learning approach, do I need to split my dataset in training and test data or is testing the DBSCAN algorithm just not possible? For outlier detection reasons, should I feed the DBSCAN model with my entire dataset?

In case testing DBSCAN is possible, can you suggest ways in doing that with Python?


Solution

  • You don't need to split your data into test and train. However, you should have a sample of labelled data from your original data if you wish to evaluate your model. There are other unsupervised ways as well, but they compare which clustering method is performing better relative to other methods that you try (algorithms or different hyperparameters).

    I would suggest reading - https://scikit-learn.org/stable/modules/clustering.html The section 2.3.10 shows the various methods for evaluation of your clustering models, and the sklearn API needed to implement them.

    You can choose which one suits your requirement best based on your problem statement.