I am using a template script and trying to feed in my data. However, I am not sure what labels_true implies as the error states that is it undefined.
Here is my data array:
data=array([[5.71585827e+00, 3.32320000e+04],
[0.00000000e+00, 0.00000000e+00],
[0.00000000e+00, 0.00000000e+00],
...,
[9.57746479e-02, 3.40000000e+01],
[7.01388889e-01, 1.01000000e+02],
[9.70350404e-02, 3.60000000e+01]])
Now I am applying this script:
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.preprocessing import StandardScaler
# #############################################################################
X=data
X = StandardScaler().fit_transform(X)
# #############################################################################
# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=10).fit(X)
labels = db.labels_
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
NameError: name 'labels_true' is not defined
From the documentation on scikit-learn homogeneity_score
(emphasis added):
Homogeneity metric of a cluster labeling given a ground truth.
where labels_true
are
ground truth class labels to be used as a reference
So, if you already have the ground truth, that would be the labels_true
argument, which would be compared with your predicted labels
to give the score.
Here the error is obviously because you have not provided such a ground truth in labels_true
, and the variable is not defined, as the error says.
It comes as a direct consequence that, if the ground truth is not available, the metric cannot be used.