I have a data frame which is divded as samples in rows and variables in columns
Upon doing a PCA:
df.pca <- PCA(df, graph = FALSE, ncp = Inf)
df.coord <- data.frame(df.pca$ind$coord)
and then k-means on my PCA data:
df.kmeans = kmeans(df.coord, 3, nstart = 25)
and to visualize cluster formation:
fviz.cluster(object = df.kmeans, data = df.pca)
I get a nice graph with the correct dimensions (dim1 75% and dim 2 12% for my data, calculated by PCA).
But if I do the exact same thing with k-medoid algorithm (PAM):
df.pca <- PCA(df, graph = FALSE, ncp = Inf)
df.coord <- data.frame(flies.todos.pca$ind$coord)
df.pam = pam(df.coord, 3, nstart = 25)
fviz.cluster(object = df.pam, data = df.pca)
I get incorrect dimensions (dim1 3.4%, dim 2 3.4%) with the exact same data.
How can I define the dimensions to those of PCA?
I tried:
fviz.cluster(object = df.pam, data = df.coord)
fviz.cluster(object = df.pam, data = df)
with no success, I always get 3.4% dimensions which are not even close to PCA values
Well, after simple exploring I already know the answer. For pam() function, a PCA is always done automatically. So basically I was doing a PCA on PCA data, which makes no sense at all. If you are going to use pam(), or any other algorithm for clustering, check if PCA is done automatically!