Search code examples
rpcaeigenvectoreigenvalue

What is the fastest way to calculate first two principal components in R?


I am using princomp in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor.

Since I only want the first two components, is there a faster way to do this?

Update :

In addition to speed, Is there a memory efficient way to do this ?

It takes ~2 hours and ~6.3 GB of physical memory for calculating first two components using svd(,2,).


Solution

  • You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen() and prcomp() do not offer this, but svd() allows you to specify the maximum number to compute.

    On small matrices, the gains seem modest:

    R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
    R> library(rbenchmark)
    R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
              test replications elapsed relative user.self sys.self user.child
    2 svd(M, 2, 0)          100   0.021  1.00000      0.02        0          0
    3    prcomp(M)          100   0.043  2.04762      0.04        0          0
    1     eigen(M)          100   0.050  2.38095      0.05        0          0
    4  princomp(M)          100   0.065  3.09524      0.06        0          0
    R> 
    

    but the factor of three relative to princomp() may be worth your while reconstructing princomp() from svd() as svd() allows you to stop after two values.