Search code examples
machine-learningscikit-learnclassificationcluster-analysisdbscan

Use sklearn DBSCAN model to classify new entries


I have a huge "dynamic" dataset and I'm trying to find interesting clusters on it.

After running a lot of different unsupervised clustering algorithms I have found a configuration of DBSCAN which gives coherent results.

I would like to extrapolate the model that DBSCAN creates according to my test data to apply it to other datasets, but without re-running the algorithm. I cannot run the algorithm over the whole dataset cause it would run out of memory, and the model might not make sense to me at a different time as the data is dynamic.

Using sklearn, I have found that other clustering algorithms - like MiniBatchKMeans - have a predict method, but DBSCAN does not.

I understand that for MiniBatchKMeans the centroids uniquely define the model. But such a thing might not exist for DBSCAN.

So my question is: What is the proper way to extrapolate the DBSCAN model? should I train a supervised learning algorithm using the output that DBSCAN gave on my test dataset? or is there something intrinsically belonging to DBSCAN model that can be used to classify new data without re-running the algorithm?


Solution

  • Train a classificator based on your model.

    DBSCAN is not easy to adapt to new objects, because you would need to eventually adjust minPts. Adding points to DBSCAN can cause clusters to merge, which you probably do not want to happen.

    If you consider the clusters found by DBSCAN to be useful, train a classifier to put new instances into the same classes. You now want to perform classification, not rediscover structure.