I'm doing a project about Hierarchical clustering, and I'm writing some code where I perform AgglomerativeClustering with every possible combination of 'affinity' and 'linkage', which are two parameters you can set. The problem arises when I try to fit the data to the algorithm. The dataset has the following shape (1300, 8) and was indexed using 'index_col=0' in order to get rid of the first column that was useless (the columns count up to 8 after dropping the useless one)
The for loop for linkage actually works fine if run separately, the problem regards the affinity one.
dataset = #csv file
aff = ["l1", "l2", "manhattan", "cosine", "precomputed", "euclidean"]
link = ["complete", "average", "single"]
for a in aff:
for l in link:
ds=dataset
ac_tune=AgglomerativeClustering(n_clusters=5, affinity=a, linkage=l)
ac_tune.fit(ds)
the error is the following:
IndexError: index 8 is out of bounds for axis 1 with size 8
It fails when you try to perform the "precomputed" affinity. For this option, the dataset needs to be a distance matrix instead of the raw data.
https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html