I have the dataset that generates the following code.
X_moons, y_moons = datasets.make_moons(n_samples=1000, noise=.07, random_state=42)
The case is that I would like to make a dendrogram (bottom-up) in Python and I must select a linkage criterion. If you consult the documentation of the function you can see the existing methods. https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html
Any suggestions on how I can move forward? Is there a foolproof way to determine the best linkage?
I have tested the cophenetic distance for my dataset with each of the methods.
There is no direct way to know which linkage is best. However, by looking at spread of data we can best guess. For your case, single linkage will produce best result.
from sklearn.cluster import AgglomerativeClustering
fig, ax = plt.subplots(1,4,figsize=(20,5))
link =['single','complete','average','ward']
for i in range(4):
model = AgglomerativeClustering(n_clusters=2, linkage=link[i])
labels = model.fit_predict(X_moons)
ax[i].scatter(X_moons[:,0],X_moons[:,1], c=labels)
ax[i].set_title(link[i])
fig.show()
Further Reading: https://www.youtube.com/watch?v=VMyXc3SiEqs