Search code examples
rcluster-analysisk-meanspca

How to define dimensions in fviz_cluster with PAM data?


I have a data frame which is divded as samples in rows and variables in columns

Upon doing a PCA:

   df.pca <- PCA(df, graph = FALSE, ncp = Inf)
   df.coord <- data.frame(df.pca$ind$coord)

and then k-means on my PCA data:

   df.kmeans = kmeans(df.coord, 3, nstart = 25) 

and to visualize cluster formation:

   fviz.cluster(object = df.kmeans, data = df.pca)

I get a nice graph with the correct dimensions (dim1 75% and dim 2 12% for my data, calculated by PCA).

But if I do the exact same thing with k-medoid algorithm (PAM):

   df.pca <- PCA(df, graph = FALSE, ncp = Inf)
   df.coord <- data.frame(flies.todos.pca$ind$coord)
   df.pam = pam(df.coord, 3, nstart = 25)  

   fviz.cluster(object = df.pam, data = df.pca)

I get incorrect dimensions (dim1 3.4%, dim 2 3.4%) with the exact same data.

How can I define the dimensions to those of PCA?

I tried:

    fviz.cluster(object = df.pam, data = df.coord)
    fviz.cluster(object = df.pam, data = df)

with no success, I always get 3.4% dimensions which are not even close to PCA values


Solution

  • Well, after simple exploring I already know the answer. For pam() function, a PCA is always done automatically. So basically I was doing a PCA on PCA data, which makes no sense at all. If you are going to use pam(), or any other algorithm for clustering, check if PCA is done automatically!