Search code examples
pythonrpca

How to get BIC/AIC plot for selecting number of Principal Components in Python or R


I want to get a plot like this one for selecting number of components in a PCA:enter image description here

I am however stuck trying to manually code the BIC/AIC. Are there any packages in either R or Python that can help me get this? Any sample code would greatly help.

Thank you


Solution

  • Here is a link to some example R code that computes AIC and BIC, as well as forward/backward/stepwise variable selection. All credit goes to Jo Hardin. I will reproduce part of the code below for convenience, slightly edited for formatting:

    > sat.data <- read.table("sat.csv", header=T, sep=",")
    > attach(sat.data)
    > sat.n <- nrow(sat.data) # be careful with missing values!!
    > ltakers <- log(takers) # variable is quite right skewed
    

    AIC and BIC in R

    Method 1:

    > sat.lm0 <- lm(sat ~ 1)
    > summary(sat.lm0)
    
    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 948.45 10.21 92.86 <2e-16 ***
    ---
    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
    
    Residual standard error: 71.5 on 48 degrees of freedom
    
    > sat.sse0 <- sum(resid(sat.lm0) ^2)
    > sat.n + sat.n*log(2*pi) + sat.n * log(sat.sse0 / sat.n) + 2 * (1+1)
    [1] 560.4736
    > AIC(sat.lm0, k=2)
    [1] 560.4736
    > sat.n + sat.n * log(2*pi) + sat.n*log(sat.sse0/sat.n) + log(sat.n)*(1+1)
    [1] 564.2573
    > AIC(sat.lm0, k=log(sat.n))
    [1] 564.2573
    

    Method 2:

    > sat.lm1 <- lm(sat ~ ltakers)
    > summary(sat.lm1)
    
    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 1112.408 12.386 89.81 <2e-16 ***
    ltakers -59.175 4.167 -14.20 <2e-16 ***
    ---
    Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
    
    Residual standard error: 31.41 on 47 degrees of freedom
    Multiple R-squared: 0.811, Adjusted R-squared: 0.807
    F-statistic: 201.7 on 1 and 47 DF, p-value: < 2.2e-16
    
    > sat.sse1 <- sum(resid(sat.lm1) ^2)
    > sat.n + sat.n*log(2*pi) + sat.n * log(sat.sse1 / sat.n) + 2 * (2+1)
    [1] 480.832
    > AIC(sat.lm1, k=2)
    [1] 480.832
    > sat.n + sat.n * log(2*pi) + sat.n*log(sat.sse1/sat.n) + log(sat.n) * (2+1)
    [1] 486.5075
    > AIC(sat.lm1, k=log(sat.n))
    [1] 486.5075