Search code examples
pythoncluster-analysishierarchical-clustering

IndexError: index 8 is out of bounds for axis 1 with size 8


I'm doing a project about Hierarchical clustering, and I'm writing some code where I perform AgglomerativeClustering with every possible combination of 'affinity' and 'linkage', which are two parameters you can set. The problem arises when I try to fit the data to the algorithm. The dataset has the following shape (1300, 8) and was indexed using 'index_col=0' in order to get rid of the first column that was useless (the columns count up to 8 after dropping the useless one)

The for loop for linkage actually works fine if run separately, the problem regards the affinity one.

dataset = #csv file
aff = ["l1", "l2", "manhattan", "cosine", "precomputed", "euclidean"]
link = ["complete", "average", "single"]

for a in aff:
    for l in link:
        ds=dataset
        ac_tune=AgglomerativeClustering(n_clusters=5, affinity=a, linkage=l)
        ac_tune.fit(ds)

the error is the following:

IndexError: index 8 is out of bounds for axis 1 with size 8

Solution

  • It fails when you try to perform the "precomputed" affinity. For this option, the dataset needs to be a distance matrix instead of the raw data.

    https://scikit-learn.org/stable/modules/generated/sklearn.cluster.AgglomerativeClustering.html