Search code examples
rfunctionoptimizationparametersdistribution

How to use optmi method gradient function?


How do I use optim() method's gradient to fit say a $f(x) = ax^2+bx+c$ to a given set of (x,y) data? I have searched for hours and found no decent explanation. $

I believe the gradient function should return a vector of length three in the above case: the partial derivative of the fit metric with respect to $a$, then with respect to $b$, then with respect to $c$. But I am not sure on how to execute that.

I have the following input for $f(x) = ax^2+bx+c$, is my gradient function correct?

{r linewidth=80}
x=c(1:10)
y=c(-0.2499211,-4.6645685,-2.6280750,-2.0146818,1.5632500,0.2043376,2.9151158,  4.0967775,6.8184074,12.5449975)
#find min square distance
my.fit.fun = function(my.par)
{
  sum(sqrt(abs(my.par[1]*x^2+my.par[2]*x+my.par[3]-y^2)))               
}

gradient=function(my.par){
  c(my.par[1]*2,my.par[2],0)
}

optim.out = optim(c(0.2,-4,-5),fn=my.fit.fun, gr=gradient, method = "BFGS")

Solution

  • First I would prefer to use Sum of Squares for the function instead of the absolute value. You could do the following:

    x <- 1:10
    y < c(-0.2499211,-4.6645685,-2.6280750,-2.0146818,1.5632500,0.2043376,2.9151158,  4.0967775,6.8184074,12.5449975)
    
    d <- data.frame(x,y)
    
    fun <- function(par, data){
      y_hat <- data$x^2 * par[1] + data$x * par[2] + par[3]
      sum((data$y - y_hat)^2)
    }
    
    optim(c(0.2,-4,-5), fun, data = d)
    $par
    [1]  0.2531111 -1.3135297 -0.6618520
    
    $value
    [1] 17.70251
    
    $counts
    function gradient 
         176       NA 
    
    $convergence
    [1] 0
    
    $message
    NULL
    

    Instead of using optim, I would use nls. Here you just provide the formula. In this case we would have:

    nls(y~ a * x^2 + b * x + c, d, c(a=0.2, b=-4, c=-5))
    Nonlinear regression model
      model: y ~ a * x^2 + b * x + c
       data: d
          a       b       c 
     0.2532 -1.3147 -0.6579 
     residual sum-of-squares: 17.7
    
    Number of iterations to convergence: 1 
    Achieved convergence tolerance: 2.816e-08
    

    Also why start with 0.2,-4, -5 any prior knowledge? If you use 0,0,0 for example in the optim, you would get the nls results

    EDIT:

    Since you want the BFGS method, you could do:

    fun <- function(par, data){
      y_hat <- data$x^2 * par[1] + data$x * par[2] + par[3]
      sum((y_hat - data$y)^2)
    }
    
    grad <- function(par, data){
      y_hat <-  data$x^2 * par[1] + data$x * par[2] + par[3]
      err <- data$y - y_hat
      -2 * c(sum(err * data$x^2), sum(err * data$x), sum(err))
    }
     optim(c(0.2,-4,-5), fun, grad,data = d, method = "BFGS")
    $par
    [1]  0.2531732 -1.3146636 -0.6579553
    
    $value
    [1] 17.70249
    
    $counts
    function gradient 
          38        7 
    
    $convergence
    [1] 0
    
    $message
    NULL