Search code examples
pythoncluster-analysisdbscan

how to get the confidence of clustering created by dbscan in python


I used the sklearn.dbscan in python and the result only gives the labels of each cluster, but I also want to calculate the confidence of clustering, or just the cluster's average distance of each other.

Do you guys have any idea?


Solution

  • I don't think this functionality is not supported by Scikit. Cluster confidence is not a thing, as DBSCAN does not use cluster probabilities. However, calculating cluster distances is relatively straightforward though.

    import numpy as np
    from sklearn.datasets import load_iris
    from sklearn.cluster import dbscan
    
    
    # Get data & labels
    data = load_iris()['data']
    labels = dbscan(data)[1]
    
    import numpy as np
    from sklearn.datasets import load_iris
    from sklearn.cluster import dbscan
    
    
    # Get data & labels
    data = load_iris()['data']
    labels = dbscan(data)[1]
    
    # Initialize results
    cluster_means = np.zeros((len(set(labels)) - 1, data.shape[1]))
    cluster_distances = np.zeros((len(data), len(set(labels)) - 1))
    
    # Loop through clusters
    for i, cluster in enumerate(set(labels)):
        # Skip noise
        if cluster == -1:
            continue
    
        # Get cluster mean
        cluster_mean = np.mean(data[labels == cluster], axis=0)
    
        # Set cluster mean
        cluster_means[i, :] = cluster_mean
    
        # Set cluster distances
        cluster_distances[:, i] = np.linalg.norm(data - cluster_mean, axis=1)