Search code examples
machine-learningstatisticsdata-scienceanalytics

Clustering model like DBSCAAN,OPTICS, KMEANS


I have a doubt whether after clustering using any algorithm is it possible to segment new data based on the learning from the previous data


Solution

  • The issue is that clustering algorithms are unsupervised learning algorithms. They don't need a dependent variable to predict classes. They are used to find structures/similarities in the data points. What you can do is, treat the clustered data as your supervised data.

    The approach would be clustering and assigning labels in the train data. Treat it as a multi-class classification data, train a new multi-class classification model using your data and validate it on the test data.

    Let train and test be the datasets.
    clusters <- Clustering(train)
    train[y] <- clusters
    model <- Classification(train, train[y])
    prediction <- model.predict(test)
    

    However interestingly KMeans in sklearn provides fit and predict method. So using KMeans from sklearn you can predict in the new data. However, DBScan doesn't have predict which is quite obvious from it's working mechanism.