Best way to minimize residual sum of squares in R; is nlm function the right way?

I have five points for which I will fit a 2nd order polynomial in R. I will do so by minimizing the sum of squared errors of prediction (SSE).

What's the best way to do so?

So far I have done this:

(1,5.8), (2,3.9), (3,4.2), (4,5.7), (5,10.2) ## my data

To this data I want to fit a 2nd order polonium with the intercept 10 and the coefficient before x^2 is set to 1. I do this:

p<-c(5.8, 3.9, 4.2, 5.7, 10.2)

f<-function(x,a){x^2-ax+}
##sum of squared errors of prediction
SSE<-function(a){ (p[i]-f(i,a) )^2 }

nlm(SSE,0.1) ## 0.1 is picked totally random

But it returns a "wrong" answer:

$minimum
[1] 9.475923e-25

$estimate
[1] 4.96

When I manually calculate SSE for a=4.96 then is SSE=9.475923e-25. What do I do wrong? My biggest concern: does nlm function run through all my data points?

Solution

I'm not sure what you mean by "wrong". There are a few glitches in your code, and I get a slightly different answer (a almost exactly = 5).

p <- c(5.8, 3.9, 4.2, 5.7, 10.2)
x <- 1:5
f <- function(x,a){x^2-a*x+10}
##sum of squared errors of prediction
SSE <- function(a){ sum((p-f(x,a))^2) }
(n1 <- nlm(SSE,0.1)) ## 0.1 is picked totally random
## $minimum
## [1] 0.22
## $estimate
## [1] 4.999998
## $gradient
## [1] 1.443291e-10
## $code
## [1] 1
## $iterations
## [1] 3

This seems like a reasonable fit, although it doesn't match the data points exactly.

par(las=1,bty="l") ## cosmetic
plot(x,p,xlim=c(0,5.5))
xvec <- seq(0,6,length.out=100)
lines(xvec,f(xvec,n1$estimate))

You could also fit this model via

ypar <- p-x^2-10   ## subtract known terms from LHS
m1 <- lm(ypar~x-1) ## fit linear term (-1 suppresses intercept)
-1*coef(m1)        ## flip sign

which gets a result of exactly 5.