Search code examples
rggplot2pca

Problems Plotting PCA in R with ggplot2


I am currently trying to plot a PCA for my data and when I run the code and have the following issues.

And furthermore, can anyone help take my data and code and produce a PLS-DA? like as in the picture? I couldn't find any good tutorials.

How can I resolve this Issue? The plots should look like:

Text

So after some help I got this far:

my code:


    library(ggplot2)
library(ggforce)

all_datanoT <- cbind(amino,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
all_datawT <- cbind(aminotnos,sphingo,hexose,phospha,lyso,cleaned_xl_Kopie)
rownames(all_datawT) <- sample_id$`Sample Identification`


alldata_naomit <-na.omit(all_datanoT)
all_datawTnaomit <-na.omit(all_datawT)

mypr <- prcomp(log2(alldata_naomit), scale = TRUE)
summary(mypr)

str(mypr)
mypr$x


PC1 <- mypr$x[, 1]
PC2 <- mypr$x[, 2]
pcat <- cbind(all_datawTnaomit, PC1, PC2)



ggplot(  
  data = pcat,
  aes(
    x = PC1,
    y = PC2,
    fill = 'Time point',
    line = 1
  ),
  shape = 1
) +
  geom_point(
    shape = 21,
    colour = "black",
    size = 2,
    stroke = 0.5,
    alpha = 0.6
  ) +
  scale_fill_brewer(palette = "Set1") +
  scale_color_brewer(palette = "Set1") +
  geom_mark_ellipse(
    aes(
      fill = 'Time point',
      color = 'Time point'
    ),
    alpha = 0.05
  ) 

which produces the following plot:

Text

How can I get it to use the two different Time values for two ellipses T0 and T1? and How can I easily Impute my data so the Na's are replaced by the column means for example instead of ommiting them just so I can plot ?

original Sample Data with dput()

dput(pcat[sample(nrow(pcat),50)])

https://gist.github.com/bicvn/47d97929a63ff99e9b260e8658407ae3

new dput

https://gist.github.com/bicvn/b06279c6bfa641303b57a3ad2cc07a21


Solution

  • Also check this, here I included an example. The trick use Comps <- as.data.frame(mypca$x) to isolate the components and then add to original data. After that you can use cbind() with Comps[,c(1,2)] to only extract the first two components. Here, I used iris dataset:

    library(ggplot2)
    library(ggforce)
    #Data
    data("iris")
    #PCA
    mypca <- prcomp(iris[,-5])
    #Isolate components
    Comps <- as.data.frame(mypca$x)
    #Extract components and bind to original data
    newiris <- cbind(iris,Comps[,c(1,2)])
    #Plot
    ggplot(newiris, aes(x=PC1, y=PC2, col = Species, fill = Species)) +
      stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
      geom_point(shape=21, col="black")
    

    Output:

    enter image description here

    In the case of data shared, only do not apply the NA action. Here the code and output with the data you shared:

    #Code
    ggplot(pcat, aes(x=PC1, y=PC2, col = `Time point`, fill = `Time point`)) +
      stat_ellipse(geom = "polygon", col= "black", alpha =0.5)+
      geom_point(shape=21, col="black")
    

    Output:

    enter image description here