Search code examples
pythonscikit-learnpcasklearn-pandas

ValueError: shapes (2,2) and (4,6) not aligned: 2 (dim 1) != 4 (dim 0)


Complaining about this line:

log_centers = pca.inverse_transform(centers)

Code:

# TODO: Apply your clustering algorithm of choice to the reduced data 
clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data)

# TODO: Predict the cluster for each data point
preds = clusterer.predict(reduced_data)

# TODO: Find the cluster centers
centers = clusterer.cluster_centers_

log_centers = pca.inverse_transform(centers)

Data:

log_data = np.log(data)

good_data = log_data.drop(log_data.index[outliers]).reset_index(drop = True)

pca = PCA(n_components=2)
pca = pca.fit(good_data)

reduced_data = pca.transform(good_data)

reduced_data = pd.DataFrame(reduced_data, columns = ['Dimension 1', 'Dimension 2'])

data is a csv; header looks like:

    Fresh   Milk    Grocery Frozen  Detergents_Paper    Delicatessen
0   14755   899 1382    1765    56  749
1   1838    6380    2824    1218    1216    295
2   22096   3575    7041    11422   343 2564

Solution

  • The problem is that pca.inverse_transform()should not take clustersas argument.

    Indeed, if you look at the documentation, it should take the data obtained from the PCA applied to your original data and not the centroids obtained with KMeans.