Search code examples
rpca

Online PCA in R


I'm trying to code online PCA in R, there is no existing implementation of this code available, thus, it may be useful for others as well. The pseudo-code can be found here (Algorithm 1). What I've done so far is as follows:

PCA<-function(X,k,epsilon){
    X_f<-norm(as.matrix(X),"f")
    d<-nrow(X)
    n<-ncol(X)
    l<-floor((8*k)/(epsilon^2))
    U<-matrix(0,d,l)
    C<-matrix(0,d,d)
    Y<-matrix(0,n,l)
    for(t in 1:n){
        r<-X[,t]-(U%*%t(U)%*%X[,t])
        n<-C + r%*%t(r)
        while(norm(n,"2") >= 2*(X_f^2)/l){
            lamb<-eigen(C)$values[1]
            u<-eigen(C)$vectors[,1]
            U<-cbind(U,u)
            #U[,which(!apply(U==0,2,all))]
            C<-C-(lamb*(u%*%t(u)))
            r<-X[,t]-(U%*%t(U)%*%X[,t])
        }
        C<-C+(r%*%t(r))
        y<-matrix(0,1,l)    
        y<-t(U)%*%x_t
        Y[t,]<-y
    }
    return(Y)
}

To test the code I used the famous fisher iris data:

log.ir <- log(iris[, 1:4])
ir.species <- iris[, 5]

ir.pca <- PCA(log.ir,50,0.2) 

There seems to be a bug in the code, which is not so obvious to me, the while loop never stops, can some one please help?


Solution

  • It's because while(norm(n,"2") >= 2*(X_f^2)/l) never finishes, 2*(X_f^2)/l) is always smaller than norm(n,"2")

    In fact if you print out the values of these, and debug(PCA) you'll see that they never change

    function(X,k,epsilon){
      X_f<-norm(as.matrix(X),"f")
      d<-nrow(X)
      n<-ncol(X)
      l<-floor((8*k)/(epsilon^2))
      U<-matrix(0,d,l)
      C<-matrix(0,d,d)
      Y<-matrix(0,n,l)
      for(t in 1:n){
        r<-X[,t]-(U%*%t(U)%*%X[,t])
        n<-C + r%*%t(r)
        while(norm(n,"2") >= 2*(X_f^2)/l){
          print(norm(n,"2") )
          print(2*(X_f^2)/l)
          lamb<-eigen(C)$values[1]
          u<-eigen(C)$vectors[,1]
          U<-cbind(U,u)
          U[,which(!apply(U==0,2,all))]
          C<-C-(lamb*(u%*%t(u)))
          r<-X[,t]-(U%*%t(U)%*%X[,t])
        }
        C<-C+(r%*%t(r))
        y<-matrix(0,1,l)    
        y<-t(U)%*%x_t
        Y[t,]<-y
      }
      return(Y)
    }
    
    debug(PCA)
    

    In general using print statements inside of functions you want to debug is a good way to diagnose problems.