Search code examples
rggbiplot

PCA plot with factoextra package. How do I change regular labels to sample names?


I've read similar questions about that here, but still didn't work for my problem. I have a table with: columns with different kinds of pollutants levels and a column with the name of the samples. However, in no way I can change the name of the labels (now it's numbers) to the samples names.

Here is my script:

read.csv2("PCA_ALL.csv", header=TRUE)->tabela
tabela

pca <- prcomp(~Ter+Hop+UCM+AHS+S_Chain+L_chain+Alkyl+HMW+LMW+TOC, scale = TRUE)

fviz_pca_biplot(pca, geom = c("point","text"), 
                addEllipses = TRUE, ggtheme = theme_gray(), 
                col.var = "black", repel=FALSE, 
                title = "PCA - GB", xlab="PC1 (39%)", ylab="PC2 (22%)") 

Does anyone know what I could do? Thanks in advance!

The first column of my table is named "area", where the name of the samples are.


Solution

  • Here is a possibile solution:

    tabela <- read.table(text="
             area  Ter  Hop   UCM   AHS S_Chain L_chain Alkyl  HMW   LMW  TOC
    1   2010-2004 0.71 3.10 119.4 136.8    0.10    3.48 11.50 7.70 16.70 1.19
    2   2004-1999 0.57 2.77  71.0  89.3    0.04    2.61  3.74 3.61  1.30 0.81
    3   1999-1993 0.87 3.10 117.7 132.1    0.04    2.77  6.38 3.08  1.94 0.90
    4   1993-1988 0.49 2.69  98.4 111.7    0.04    2.64  4.60 3.57  1.81 0.86
    5   1988-1982 1.44 4.43  80.0  93.9    0.05    3.10  7.27 3.84  2.92 1.06
    6   1982-1977 0.57 4.80  55.1  65.5    0.03    3.80  9.87 5.50  8.80 1.25
    7   1977-1971 0.62 3.16 174.7 190.8    0.04    3.15  6.00 3.58  1.51 1.08
    8   1971-1966 1.17 5.77 162.3 174.8    0.04    3.39  5.95 5.68  2.65 1.11
    9   1966-1960 1.28 8.13 155.4 194.6    0.05    5.61  5.74 5.47  2.45 1.16
    10 1960-1954  0.69 6.77  96.0 134.2    0.04    5.51  3.74 4.73  2.41 1.16
    11  1954-1949 0.75 4.56  65.8 122.6    0.07    4.91  7.97 5.33  2.48 0.83
    12  1949-1943 0.58 4.74  70.2 112.3    0.05    5.38  4.94 7.47  3.19 1.19
    13  1943-1938 3.00 6.22  66.9 105.9    0.08    5.78 16.20 8.40  3.79 1.19
    14  1938-1932 0.77 3.44  96.4 141.7    0.06    4.93  8.48 3.60  4.12 1.06
    15  1932-1927 0.40 4.37  84.7 126.3    0.04    4.36  3.73 3.67  1.66 0.90
    16  1927-1921 1.95 5.06  51.8  92.5    0.07    5.08 11.74 4.36  5.29 1.01
    ", header=T)
    
    pca <- prcomp(~Ter+Hop+UCM+AHS+S_Chain+L_chain+Alkyl+HMW+LMW+TOC, scale = TRUE, data=tabela)
    # Set row names for the matrix with rotated data
    dimnames(pca$x)[[1]] <- tabela$area
    
    library(factoextra)
    fviz_pca_biplot(pca, geom = c("point","text"), 
                    addEllipses = TRUE, ggtheme = theme_gray(), 
                    col.var = "black", repel=FALSE, 
                    title = "PCA - GB", xlab="PC1 (39%)", ylab="PC2 (22%)") 
    

    enter image description here