Search code examples
rpca

PCA - Extract variables falling below a certain threshold from the Communalities table in R


I'm new here, and still learning how to properly use R, but I find myself in the need of some expert help. I am currently using the package EFA.dimensions to do my PCA. For this purpose, my script looks like this:

PCA(data, corkind='pearson', Nfactors=11, Ncases=NULL)

From the "Communalities" table which appears on the results (shown in the image below), I would like to extract the list of those variables with an extraction communality below 0.80. Here in the shown example there is only one variable, "ZLocomotionSocial", but I have another dataset which might end up containing many of them, so it would be great to not have to look for them one by one. If it helps, the final objective is to remove those variables from "data" and then re-run the PCA.

Communalities table example

Any suggestions on which code I can use to sort this out?


Solution

  • You can transform the communalities object in a data frame and then do some basic filtering using the dplyr package:

    library(tidyverse)
    library(EFA.dimensions)
    
    communalities <-
      data_Harman %>%
      PCA(Nfactors=3, corkind = "pearson") %>%
      pluck("communalities") %>%
      as_tibble(rownames = "variable")
    #> 
    #> Ncases must be provided when data is a correlation matrix.
    #> 
    #> 
    #> Principal Components Analysis
    #> 
    #> Specified kind of correlations for this analysis: from user
    #> 
    #> The specified number of factors to extract = 3
    #> 
    #> Model Fit Coefficients:
    #> 
    #> RMSR = 0.046
    #> 
    #> GFI = 0.993
    #> 
    #> CAF = 0.5
    #> 
    #> 
    #> Eigenvalues and factor proportions of variance:
    #>             Eigenvalues    Proportion of Variance    Cumulative Prop. Variance
    #> Factor 1           4.67                      0.58                         0.58
    #> Factor 2           1.77                      0.22                         0.81
    #> Factor 3           0.48                      0.06                         0.87
    #> Factor 4           0.42                      0.05                         0.92
    #> Factor 5           0.23                      0.03                         0.95
    #> Factor 6           0.19                      0.02                         0.97
    #> Factor 7           0.14                      0.02                         0.99
    #> Factor 8           0.10                      0.01                         1.00
    #> 
    #> Unrotated PCA Loadings:
    #>               Factor 1   Factor 2   Factor 3
    #> Height           -0.86      -0.37      -0.07
    #> Arm.span         -0.84      -0.44       0.08
    #> Forearm          -0.81      -0.46       0.01
    #> Leg.length       -0.84      -0.40      -0.10
    #> Weight           -0.76       0.52      -0.15
    #> Hips             -0.67       0.53      -0.05
    #> Chest.girth      -0.62       0.58      -0.29
    #> Chest.width      -0.67       0.42       0.59
    #> 
    #> Promax Rotation Pattern Matrix:
    #>               Factor 1   Factor 2   Factor 3
    #> Height           -0.92       0.10      -0.05
    #> Arm.span         -0.95      -0.10       0.12
    #> Forearm          -0.95      -0.07       0.02
    #> Leg.length       -0.93       0.10      -0.10
    #> Weight           -0.07       0.87       0.06
    #> Hips              0.01       0.75       0.17
    #> Chest.girth       0.06       0.99      -0.13
    #> Chest.width       0.00       0.07       0.94
    #> 
    #> Promax Rotation Structure Matrix:
    #>               Factor 1   Factor 2   Factor 3
    #> Height           -0.94       0.44       0.37
    #> Arm.span         -0.95       0.35       0.42
    #> Forearm          -0.93       0.33       0.35
    #> Leg.length       -0.93       0.42       0.32
    #> Weight           -0.45       0.93       0.61
    #> Hips             -0.36       0.85       0.62
    #> Chest.girth      -0.30       0.89       0.44
    #> Chest.width      -0.40       0.64       0.99
    #> 
    #> Eigenvalues and factor proportions of variance:
    #>             Eigenvalues    Proportion of Variance    Cumulative Prop. Variance
    #> Factor 1           3.51                      1.17                         1.17
    #> Factor 2           2.33                      0.78                         1.95
    #> Factor 3           0.96                      0.32                         2.27
    #> 
    #> Promax Rotation Factor Correlations:
    #>            Factor 1   Factor 2   Factor 3
    #> Factor 1       1.00      -0.41      -0.39
    #> Factor 2      -0.41       1.00       0.60
    #> Factor 3      -0.39       0.60       1.00
    communalities
    #> # A tibble: 8 x 2
    #>   variable    Communalities
    #>   <chr>               <dbl>
    #> 1 Height              0.882
    #> 2 Arm.span            0.909
    #> 3 Forearm             0.872
    #> 4 Leg.length          0.871
    #> 5 Weight              0.872
    #> 6 Hips                0.742
    #> 7 Chest.girth         0.803
    #> 8 Chest.width         0.975
    
    selected_communalities <-
      communalities %>%
      filter(Communalities < 0.8)
    selected_communalities
    #> # A tibble: 1 x 2
    #>   variable Communalities
    #>   <chr>            <dbl>
    #> 1 Hips             0.742
    
    selected_variables <- selected_communalities$variable
    selected_variables
    #> [1] "Hips"
    

    Created on 2021-09-10 by the reprex package (v2.0.1)