Here is my reproduceable data and code:
dd<-structure(list(chr1_1005501 = c(0.597222222222222, 0.75, 0.775,
0.732456140350877, 0.860696517412935, 0.777777777777778, 0.654545454545455,
0.811224489795918, 0.742424242424242, 0.806146572104019, 0.828402366863905,
0.718181818181818, 0.85663082437276, 0.853868194842407, 0.849624060150376,
0.759615384615385, 0.798611111111111, 1), chr1_1018001 = c(0.615384615384615,
1, 0.95959595959596, 0.976415094339623, 0.979166666666667, 0.920634920634921,
0.981481481481482, 1, 0.94578313253012, 0.967213114754098, 0.986486486486487,
0.952380952380952, 0.965116279069767, 0.986238532110092, 0.975206611570248,
0.941860465116279, 0.973333333333333, 0.866666666666667), chr1_1021001 = c(0.286713286713287,
0.5625, 0.606741573033708, 0.509433962264151, 0.233766233766234,
0.402684563758389, 0.696, 0.454545454545455, 0.495934959349593,
0.381642512077295, 0.444444444444444, 0.580645161290323, 0.443708609271523,
0.482269503546099, 0.595890410958904, 0.422535211267606, 0.397435897435897,
0.7), chr1_102234501 = c(0.782312925170068, 0.857142857142857,
0.884057971014493, 0.979020979020979, 0.916666666666667, 0.930555555555556,
0.976190476190476, 0.939393939393939, 0.935064935064935, 0.9,
0.939024390243902, 0.966666666666667, 0.953125, 0.920634920634921,
0.961165048543689, 0.848484848484849, 1, 1), chr1_1032001 = c(0.846153846153846,
0.666666666666667, 0.72, 0.701492537313433, 0.423076923076923,
0.526315789473684, 0.658536585365854, 0.729166666666667, 0.659090909090909,
0.55, 0.46875, 0.586206896551724, 0.574074074074074, 0.621212121212121,
0.597014925373134, 0.617647058823529, 0.777777777777778, 0.7),chr1_10399501 = c(0.210045662100457, 0.294117647058824, 0.0393518518518519,
0.0558739255014327, 0.123737373737374, 0.0852994555353902,
0.0453216374269006, 0.106451612903226, 0.119584055459272,
0.0448275862068966, 0.0467741935483871, 0.0831643002028398,
0.0418604651162791, 0.0635451505016722, 0.161290322580645), chr1_10400501 = c(0.634020618556701, 0.794117647058823,
0.895833333333333, 0.833333333333333, 0.928571428571429,
0.96969696969697, 0.892857142857143, 0.897435897435897, 0.858585858585859,
0.869565217391304, 0.9, 0.879194630872483, 0.875, 0.916666666666667,
0.911111111111111, 0.875, 0.8, 0.848484848484849), chr1_104933001 = c(0.892215568862275,
1, 0.347826086956522, 0.365853658536585, 0.603174603174603,
0.717948717948718, 0.632911392405063, 0, 0.472727272727273,
0.4, 0.478260869565217, 0.646153846153846, 0.540229885057471,
0.723076923076923, 0.574712643678161, 0.760869565217391,
0.4375, 0.285714285714286), chr1_10530501 = c(0.528301886792453,
0.849056603773585, 0.803030303030303, 0.821782178217822,
0.888888888888889, 0.827586206896552, 0.773584905660377,
0.8125, 0.925373134328358, 0.849462365591398, 0.921052631578947,
0.86046511627907, 0.866666666666667, 0.75, 0.911764705882353,
0.857142857142857, 0.833333333333333, 0.914634146341463),
chr1_1058501 = c(0, 0.0434782608695652, 0.520833333333333,
0.444444444444444, 0.370967741935484, 0.309859154929577,
0.220338983050847, 0.477272727272727, 0.344827586206897,
0.524590163934426, 0.294117647058824, 0.168067226890756,
0.338983050847458, 0.446153846153846, 0.39344262295082, 0.444444444444444,
0.319148936170213, 0.136363636363636), Sample = structure(c(1L,
1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L,
6L), levels = c("A", "B", "C", "D", "E", "F"), class = "factor")), row.names = c("A1",
"A2", "B1", "B2", "C1",
"C2", "D1", "D2", "E1",
"E2", "E3", "F1", "F2",
"F3", "F4", "F5", "F6",
"F7"), class = "data.frame")
dd
pca_result <- prcomp(dd[,-ncol(dd)],
scale=T, #
center = T
#num.cores = 30
)
Sample=factor(c(rep("A",2),
rep("B",2),
rep("C",2),
rep("D",2),
rep("E",3),
rep("F",7)),levels = unique(dd$Sample))
ggbiplot(pca_result,
var.axes=F, #
obs.scale = 1, #
var.scale = 1,
groups = Sample, #
ellipse = F, #
#ellipse.prob = 0.67,
circle = F)+
geom_point(aes(color = Sample), size = 1) +
#geom_encircle(aes(fill=Sample), alpha = 0.3, show.legend = F,colour=NA) +
ggforce::geom_mark_ellipse(aes(fill = Sample,
color = Sample))
##############
I made a pca plot and then I want to add confidence ellipse by ggplot2 or ggforce or factoextra or ggbiplot package.
But there is a small problem that there are 6 groups but some of them have only two samples,
So I tried ggplot2 or factoextra and set related parameters:
such as: in ggbiplot
ellipse.type = "confidence", ellipse.level = 0.95
Or in factoextra: I used fviz_pca_biplot() or fviz_pca_ind():
fviz_pca_biplot(pca_result,
col.ind = Sample,
addEllipses = T,
ellipse.type = "confidence",
ellipse.level = 0.95,
col.var = "black",
repel = TRUE )
But it reminds me with: Too few points to calculate an ellipse
So, I tried the last choice: ggforce.
+ggforce::geom_mark_ellipse(aes(fill = Sample,
color = Sample))
It works well and I got beautiful figure.
However, my question is I am not sure if the regions in my plot mean confidence intervel or other meaning that I don't know. Many packages like factoextra have said that circles produced by it means confidence interval. But I didn't see similar description like it in ggforce package.
So could somebody give me some advice or explanation. Thanks in advance.
The ellipses in geom_mark_ellipse
simply draws the smallest possible ellipse that can enclose the points in each group. As the help file states:
The enclosing ellipses are estimated using the Khachiyan algorithm
The Khachiyan algorithm is a method for determining the smallest volume ellipsoid that can enclose a group of points, and in a 2D space such as your biplot, this means the smallest ellipse that can cover all the points.
geom_mark_ellipse
is really intended for annotations, since by default it will expand the ellipse by 5mm and allows for labels within each ellipse. This is why it is not immediately apparent that the function works by finding the minimum enclosing ellipse. If you look carefully at your plot, you will notice that some of the groups are not true ellipses due to this fixed - distance expansion.
You can get the actual smallest ellipse by setting expand = 0
, where it becomes apparent that the ellipse is simply the smallest ellipse that encloses all the points:
ggbiplot(pca_result,
var.axes = FALSE,
obs.scale = 1,
var.scale = 1,
groups = Sample,
ellipse = FALSE,
circle = FALSE)+
geom_point(aes(color = Sample), size = 1) +
ggforce::geom_mark_ellipse(aes(fill = Sample,
color = Sample), expand = unit(0, 'mm'),
tol = 0.001)
Here we can see that the ellipses are coincident with the outermost points in each group. Where there are only two points, the result should probably just be a zero-width ellipse (i.e. a straight line), but there seems to be a maximum aspect ratio built in to keep the shape obviously elliptical for aesthetic purposes.