Search code examples
pythonarrayspython-2.7clustered-index

How to get result of DBSCAN refer to example from http://scikit-learn.org/


Refer to this example of using DBSCAN, real data input for clustering process is 'X'. But following to the example, i used 'X1' for build model for clustering.

# -*- coding: utf-8 -*-
"""
===================================
Demo of DBSCAN clustering algorithm
===================================

Finds core samples of high density and expands clusters from them.

"""
#print(__doc__)

import numpy as np

from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets.samples_generator import make_blobs
from sklearn.preprocessing import StandardScaler


# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X=[(9,0),(7,8),(8,6),(1,2),(1,3),(7,6),(10,14)]

X1 = StandardScaler().fit_transform(X)
##############################################################################
# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=10).fit(X1)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool) # bikin matriks      False ukuran matriks db.labels

core_samples_mask[db.core_sample_indices_] = True # bikin matriks, kalau indexnya ada di matriks db, maka true
labels = db.labels_

print "cluster: ", set(labels)

# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)

In this case i want to get members of noise, so I print xy if k=-1. Unfortunately, xy is refers to X1 not the real data X.

# Plot result
import matplotlib.pyplot as plt

# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = plt.cm.Spectral(np.linspace(0, 1, len(unique_labels)))

for k, col in zip(unique_labels, colors):
   class_member_mask = (labels == k)
   if k == -1:
   # Black used for noise.
       xy = X1[class_member_mask]
       print "Noise :", xy
   else:
       xy = X1[class_member_mask & core_samples_mask]

       plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
         markeredgecolor='k', markersize=14)

       xy = X1[class_member_mask & ~core_samples_mask]
         plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=col,
         markeredgecolor='k', markersize=6)

plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()

When I try to replace X1 to 'X', I get an error.

xy = X[class_member_mask]

error:

xy=X[class_member_mask&~core_samples_mask] TypeError: only integer arrays with one element can be converted to an index

May be its because format X1 and X is different. I think it's will solve if I know to how convert X format to X1

X=[(9,0),(7,8),(8,6),(1,2),(1,3),(7,6),(10,14)]
X1=[[ 0.8406627  -1.30435512]
   [ 0.25219881  0.56856505]
   [ 0.54643076  0.10033501]
   [-1.51319287 -0.83612508]
   [-1.51319287 -0.60201006]
   [ 0.25219881  0.10033501]
   [ 1.13489465  1.97325518]]  

Help me, give suggestion please...


Solution

  • Convert X1 to numpy array:

    X1=[[ 0.8406627,  -1.30435512],
       [ 0.25219881,  0.56856505],
       [ 0.54643076,  0.10033501],
       [-1.51319287, -0.83612508],
       [-1.51319287, -0.60201006],
       [ 0.25219881,  0.10033501],
       [ 1.13489465,  1.97325518]]
    
    X1 =  np.asarray(X1)