Search code examples
rdataframepca

How can I perform a PCA with variables of three different data frames and color discriminate them?


I have three data frames and I want to perform a Principal Component Analysis (PCA) in R. I merged the data frames with rbind() and did a PCA with that. That worked. But I want to discriminate the dots according to the data frame they belong to. With the merged data frame, that is impossible (or isn´t it?). When I use PCA(X=c(df1,df2,df3) it is complaining about differing number of rows (which is obviously actually the case).

pca <- PCA(X=c(df1,df2,df3))
fviz_pca_ind(pca,
             geom.ind = "point", # show points only (nbut not "text")
             col.ind = c(df1,df2,df3), # color by groups
             palette = c("#00AFBB", "#E7B800", "#FC4E07"),
             addEllipses = TRUE, # Concentration ellipses
             legend.title = "Groups"
             )

That is not working...

How can I perform a PCA with variables of three different data frames and color discriminate them? I have no reprex because it is difficult to provide in that case.

Thank you all for your suggestions ;)


Solution

  • You need to collect the length of your data frames, one way is shown below, where I collect 3 dataframes in a list:

    library(FactoMineR)
    library(factoextra)
    
    df1 = subset(iris,Species=="setosa")[,-5]
    df2 = subset(iris,Species=="versicolor")[,-5]
    df3 = subset(iris,Species=="virginica")[,-5]
    
    X = list(df1=df1,df2=df2,df3=df3)
    

    you combine them using do.call(rbind..) and the labels are repeating the names of the data frame, by its number of rows:

    labels = rep(names(X),sapply(X,nrow))
    table(labels)
    

    Then you plot, giving the col.ind as labels:

    pca <- PCA(do.call(rbind,X))
    fviz_pca_ind(pca,
                 geom.ind = "point", # show points only (nbut not "text")
                 col.ind = labels, # color by groups
                 palette = c("#00AFBB", "#E7B800", "#FC4E07"),
                 addEllipses = TRUE, # Concentration ellipses
                 legend.title = "Groups"
    )
    

    enter image description here