Search code examples
rggplot2pcaggbiplot

ggbiplot graphical display in groups


I am learning biplot with wine data set. How does R know Barolo, Grignolino and Barbera are wine.class while we don't see the wine class column in the data set?

More details about the wine data set are in the following links

ggbiplot - how not to use the feature vectors in the plot

https://github.com/vqv/ggbiplot

Thanks very much


Solution

  • In the wine dataset, you have 2 objects, one data.frame wine with 178 observations of 13 quantitative variables:

    str(wine)
    'data.frame':   178 obs. of  13 variables:
     $ Alcohol       : num  14.2 13.2 13.2 14.4 13.2 ...
     $ MalicAcid     : num  1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ...
     $ Ash           : num  2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ...
     $ AlcAsh        : num  15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
     $ Mg            : int  127 100 101 113 118 112 96 121 97 98 ...
     $ Phenols       : num  2.8 2.65 2.8 3.85 2.8 3.27 2.5 2.6 2.8 2.98 ...
     $ Flav          : num  3.06 2.76 3.24 3.49 2.69 3.39 2.52 2.51 2.98 3.15 ...
     $ NonFlavPhenols: num  0.28 0.26 0.3 0.24 0.39 0.34 0.3 0.31 0.29 0.22 ...
     $ Proa          : num  2.29 1.28 2.81 2.18 1.82 1.97 1.98 1.25 1.98 1.85 ...
     $ Color         : num  5.64 4.38 5.68 7.8 4.32 6.75 5.25 5.05 5.2 7.22 ...
     $ Hue           : num  1.04 1.05 1.03 0.86 1.04 1.05 1.02 1.06 1.08 1.01 ...
     $ OD            : num  3.92 3.4 3.17 3.45 2.93 2.85 3.58 3.58 2.85 3.55 ...
     $ Proline       : int  1065 1050 1185 1480 735 1450 1290 1295 1045 1045 ...
    

    There is also one vector wine.class that contains 178 observations of the qualitative wine.class variable:

    str(wine.class)
     Factor w/ 3 levels "barolo","grignolino",..: 1 1 1 1 1 1 1 1 1 1 ...
    

    The 13 quantitative variables are used to compute the PCA:

    wine.pca <- prcomp(wine, scale. = TRUE)
    

    while the wine.class variable is just used to color the points on the plot