Search code examples
rmathematical-optimization

Best way to minimize residual sum of squares in R; is nlm function the right way?


I have five points for which I will fit a 2nd order polynomial in R. I will do so by minimizing the sum of squared errors of prediction (SSE).

What's the best way to do so?

So far I have done this:

(1,5.8), (2,3.9), (3,4.2), (4,5.7), (5,10.2) ## my data

To this data I want to fit a 2nd order polonium with the intercept 10 and the coefficient before x^2 is set to 1. I do this:

p<-c(5.8, 3.9, 4.2, 5.7, 10.2)

f<-function(x,a){x^2-ax+}
##sum of squared errors of prediction
SSE<-function(a){ (p[i]-f(i,a) )^2 }

nlm(SSE,0.1) ## 0.1 is picked totally random

But it returns a "wrong" answer:

$minimum
[1] 9.475923e-25

$estimate
[1] 4.96

When I manually calculate SSE for a=4.96 then is SSE=9.475923e-25. What do I do wrong? My biggest concern: does nlm function run through all my data points?


Solution

  • I'm not sure what you mean by "wrong". There are a few glitches in your code, and I get a slightly different answer (a almost exactly = 5).

    p <- c(5.8, 3.9, 4.2, 5.7, 10.2)
    x <- 1:5
    f <- function(x,a){x^2-a*x+10}
    ##sum of squared errors of prediction
    SSE <- function(a){ sum((p-f(x,a))^2) }
    (n1 <- nlm(SSE,0.1)) ## 0.1 is picked totally random
    ## $minimum
    ## [1] 0.22
    ## $estimate
    ## [1] 4.999998
    ## $gradient
    ## [1] 1.443291e-10
    ## $code
    ## [1] 1
    ## $iterations
    ## [1] 3
    

    This seems like a reasonable fit, although it doesn't match the data points exactly.

    par(las=1,bty="l") ## cosmetic
    plot(x,p,xlim=c(0,5.5))
    xvec <- seq(0,6,length.out=100)
    lines(xvec,f(xvec,n1$estimate))
    

    enter image description here

    You could also fit this model via

    ypar <- p-x^2-10   ## subtract known terms from LHS
    m1 <- lm(ypar~x-1) ## fit linear term (-1 suppresses intercept)
    -1*coef(m1)        ## flip sign
    

    which gets a result of exactly 5.