Search code examples
rprincomp

R - Limit output of summary.princomp


I'm running a principal component analysis on a dataset with more than 1000 variables. I'm using R Studio and when I run the summary to see the cumulative variance of the components, I can only see the last few hundred components. How do I limit the summary to only show, say, the first 100 components?


Solution

  • It's pretty easy to modify print.summary.princomp (you can see the original code by typing stats:::print.summary.princomp) to do this:

    pcaPrint <- function (x, digits = 3, loadings = x$print.loadings, cutoff = x$cutoff,n, ...) 
    {
        #Check for sensible value of n; default to full output
        if (missing(n) || n > length(x$sdev) || n < 1){n <- length(x$sdev)}
        vars <- x$sdev^2
        vars <- vars/sum(vars)
        cat("Importance of components:\n")
        print(rbind(`Standard deviation` = x$sdev[1:n], `Proportion of Variance` = vars[1:n], 
            `Cumulative Proportion` = cumsum(vars)[1:n]))
        if (loadings) {
            cat("\nLoadings:\n")
            cx <- format(round(x$loadings, digits = digits))
            cx[abs(x$loadings) < cutoff] <- paste(rep(" ", nchar(cx[1, 
                1], type = "w")), collapse = "")
            print(cx[,1:n], quote = FALSE, ...)
        }
        invisible(x)
    }
    
    pcaPrint(summary(princomp(USArrests, cor=TRUE),
                  loadings = TRUE, cutoff = 0.2), digits = 2,n = 2)
    

    Edited To include a basic check for a sensible value for n. Now that I've done this, I wonder if it isn't worth suggesting to R Core as a permanent addition; seems simple and like it might be useful.