Complaining about this line:
log_centers = pca.inverse_transform(centers)
# TODO: Apply your clustering algorithm of choice to the reduced data
clusterer = KMeans(n_clusters=2, random_state=0).fit(reduced_data)
# TODO: Predict the cluster for each data point
preds = clusterer.predict(reduced_data)
# TODO: Find the cluster centers
centers = clusterer.cluster_centers_
log_centers = pca.inverse_transform(centers)
log_data = np.log(data)
good_data = log_data.drop(log_data.index[outliers]).reset_index(drop = True)
pca = PCA(n_components=2)
pca =
reduced_data = pca.transform(good_data)
reduced_data = pd.DataFrame(reduced_data, columns = ['Dimension 1', 'Dimension 2'])
data is a csv; header looks like:
Fresh Milk Grocery Frozen Detergents_Paper Delicatessen
0 14755 899 1382 1765 56 749
1 1838 6380 2824 1218 1216 295
2 22096 3575 7041 11422 343 2564
The problem is that pca.inverse_transform()
should not take clusters
as argument.
Indeed, if you look at the documentation, it should take the data obtained from the PCA applied to your original data and not the centroids obtained with KMeans.