Search code examples
pythonscikit-learncluster-analysisdbscan

How to use DBSCAN method from sklearn for clustering


I have a three parameters database for clustering. For example, I can get image result easily from Kmean by sklearn, like that: (val is my database, its shape like (3000,3))

y_pred = KMeans(n_clusters= 4 , random_state=0).fit_predict(val)
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1,projection='3d')
ax1.scatter(val[:, 0], val[:, 1], val[:, 2], c=y_pred)
plt.show()

However, in DBSCAN, I just directly use this one:

from sklearn.cluster import DBSCAN
from sklearn.preprocessing import StandardScaler
val = StandardScaler().fit_transform(val)
db = DBSCAN(eps=3, min_samples=4).fit(val)
labels = db.labels_
core_samples = np.zeros_like(labels, dtype=bool)
core_samples[db.core_sample_indices_] =True

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)

So how to get the image result of DBSCAN, just like Kmean?


Solution

  • You can reuse the same code from your KMeans model. All you need to do it re-assign val and y_pred to ignore the noise labels.

    # DBSCAN snippet from the question
    from sklearn.cluster import DBSCAN
    from sklearn.preprocessing import StandardScaler
    val = StandardScaler().fit_transform(val)
    db = DBSCAN(eps=3, min_samples=4).fit(val)
    labels = db.labels_
    
    # re-assign y_pred and core (as val)
    y_pred, core = labels[labels != -1], val[labels != -1]
    
    # plotting snippet from the question
    fig = plt.figure()
    ax1 = fig.add_subplot(1,1,1,projection='3d')
    ax1.scatter(core[:, 0], core[:, 1], core[:, 2], c=y_pred)
    plt.show()