I can apply PCA on the classic Iris dataset to obtain the cumulative proportion per dimension:
library(tidyverse)
x <- iris[,1:4] %>% as.matrix()
pca <- prcomp(x)
summary(pca)
But I don't know how can I do that with tidymodels. My code so far is:
library(tidymodels)
iris_vars <- iris %>% select(-Species)
iris_rec <- recipe(~., iris_vars) %>%
step_pca(all_predictors())
iris_prep <- prep(iris_rec)
iris_tidy <- tidy(iris_prep,1)
iris_tidy
summary(iris_tidy)
I would like to obtain this with tidymodels:
Importance of components:
PC1 PC2 PC3 PC4
Standard deviation 2.0563 0.49262 0.2797 0.15439
Proportion of Variance 0.9246 0.05307 0.0171 0.00521
Cumulative Proportion 0.9246 0.97769 0.9948 1.00000
Any help will be greatly appreciated.
You can get the same results, if you use the same model. prcomp()
defaults to center = TRUE
, whereas step_pca()
defaults to center = FALSE
. In the following, I use centering and scaling for both (since this is often recommended).
library("tidymodels")
x <- iris[,1:4] %>% as.matrix()
pca <- prcomp(x, scale. = TRUE)
summary(pca)
#> Importance of components:
#> PC1 PC2 PC3 PC4
#> Standard deviation 1.7084 0.9560 0.38309 0.14393
#> Proportion of Variance 0.7296 0.2285 0.03669 0.00518
#> Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
iris_rec <- recipe(Species ~ ., iris) %>%
step_normalize(all_predictors()) %>%
step_pca(all_predictors())
iris_prep <- prep(iris_rec)
summary(iris_prep$steps[[2]]$res)
#> Importance of components:
#> PC1 PC2 PC3 PC4
#> Standard deviation 1.7084 0.9560 0.38309 0.14393
#> Proportion of Variance 0.7296 0.2285 0.03669 0.00518
#> Cumulative Proportion 0.7296 0.9581 0.99482 1.00000
Created on 2020-05-29 by the reprex package (v0.3.0)