I have a dataset in form that column 1 contain treatments name and remaining columns contain values for those treatments and there are three replicates for each treatment. For illustration I have created simulated dataset using iris dataset as shown below:
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
I want to perform a pca on this dataset using R in a way that treatments with replicates are plotted on the plot instead of variables, treatment names should also be labelled on the plot. I have looked for similar questions on stackoverflow but did not find one similar to my problem.
Are you looking to make a scatter plot with the first and second principle components plotted on the x and y axes, respectively? And then you want the points to be labeled with the treatments? If so, you can give this a shot. I am using the ggplot2
package.
I also added a color aesthetic to the pot. Feel free to drop that part if you don't want it.
df <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_d" 4.9 3.1 1.5 0.1
"treatment_d" 5.4 3.7 1.5 0.2
"treatment_d" 4.8 3.4 1.6 0.2
"treatment_e" 4.8 3 1.4 0.1
"treatment_e" 4.3 3 1.1 0.1
"treatment_e" 5.8 4 1.2 0.2
"treatment_f" 5.7 4.4 1.5 0.4
"treatment_f" 5.4 3.9 1.3 0.4
"treatment_f" 5.1 3.5 1.4 0.3
"treatment_g" 5.7 3.8 1.7 0.3
"treatment_g" 5.1 3.8 1.5 0.3
"treatment_g" 5.4 3.4 1.7 0.2
"treatment_h" 5.1 3.7 1.5 0.4
"treatment_h" 4.6 3.6 1 0.2
"treatment_h" 5.1 3.3 1.7 0.5', header = TRUE)
# run principle components, ignore first column
pr <- prcomp(df[, 2:5])
# run predict to get the first and second principle components
pr_pred <- predict(pr)
# put this into a data frame so we can use ggplot
df2 <- data.frame(Treatment = df$Treatment,
pr_pred[, 1:2])
library(ggplot2)
ggplot(data = df2, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text()
To add these, we'll have to change how many categories there are. We'll go with three. Hopefully, in your actual data set, there are enough to draw the ellipses you are looking for.
df_mod <- read.table(text = '"Treatment" "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
"treatment_a" 5.1 3.5 1.4 0.2
"treatment_a" 4.9 3 1.4 0.2
"treatment_a" 4.7 3.2 1.3 0.2
"treatment_b" 4.6 3.1 1.5 0.2
"treatment_b" 5 3.6 1.4 0.2
"treatment_b" 5.4 3.9 1.7 0.4
"treatment_c" 4.6 3.4 1.4 0.3
"treatment_c" 5 3.4 1.5 0.2
"treatment_c" 4.4 2.9 1.4 0.2
"treatment_a" 4.9 3.1 1.5 0.1
"treatment_a" 5.4 3.7 1.5 0.2
"treatment_a" 4.8 3.4 1.6 0.2
"treatment_b" 4.8 3 1.4 0.1
"treatment_b" 4.3 3 1.1 0.1
"treatment_b" 5.8 4 1.2 0.2
"treatment_c" 5.7 4.4 1.5 0.4
"treatment_c" 5.4 3.9 1.3 0.4
"treatment_c" 5.1 3.5 1.4 0.3
"treatment_a" 5.7 3.8 1.7 0.3
"treatment_a" 5.1 3.8 1.5 0.3
"treatment_b" 5.4 3.4 1.7 0.2
"treatment_b" 5.1 3.7 1.5 0.4
"treatment_c" 4.6 3.6 1 0.2
"treatment_c" 5.1 3.3 1.7 0.5', header = TRUE)
pr_mod <- prcomp(df_mod[, 2:5])
pr_pred_mod <- predict(pr_mod)
df2_mod <- data.frame(Treatment = df_mod$Treatment,
pr_pred_mod[, 1:2])
ggplot(data = df2_mod, aes(x = PC1, y = PC2,
colour = Treatment,
label = Treatment)) +
geom_text() +
stat_ellipse(show.legend = FALSE)