Search code examples
rmodelplotnlsmodel-fitting

R Nonlinear Least Squares (nls) Model Fitting


I'm trying to fit the information from the G function of my data to the following mathematical mode: y = A / ((1 + (B^2)*(x^2))^((C+1)/2)) . The shape of this graph can be seen here:

http://www.wolframalpha.com/input/?i=y+%3D+1%2F+%28%281+%2B+%282%5E2%29*%28x%5E2%29%29%5E%28%282%2B1%29%2F2%29%29

Here's a basic example of what I've been doing:

data(simdat)

library(spatstat)

simdat.Gest <- Gest(simdat) #Gest is a function within spatstat (explained below)

Gvalues <- simdat.Gest$rs

Rvalues <- simdat.Gest$r

GvsR_dataframe <- data.frame(R = Rvalues, G = rev(Gvalues))

themodel <- nls(rev(Gvalues) ~ (1 / (1 + (B^2)*(R^2))^((C+1)/2)), data = GvsR_dataframe, start = list(B=0.1, C=0.1), trace = FALSE)

"Gest" is a function found within the 'spatstat' library. It is the G function, or the nearest-neighbour function, which displays the distance between particles on the independent axis, versus the probability of finding a nearest neighbour particle on the dependent axis. Thus, it begins at y=0 and hits a saturation point at y=1.

If you plot simdat.Gest, you'll notice that the curve is 's' shaped, meaning that it starts at y = 0 and ends up at y = 1. For this reason, I reveresed the vector Gvalues, which are the dependent variables. Thus, the information is in the correct orientation to be fitted the above model.

You may also notice that I've automatically set A = 1. This is because G(r) always saturates at 1, so I didn't bother keeping it in the formula.

My problem is that I keep getting errors. For the above example, I get this error:

Error in nls(rev(Gvalues) ~ (1/(1 + (B^2) * (R^2))^((C + 1)/2)), data = GvsR_dataframe,  : 
  singular gradient

I've also been getting this error:

Error in nls(Gvalues1 ~ (1/(1 + (B^2) * (x^2))^((C + 1)/2)), data = G_r_dataframe,  : 
  step factor 0.000488281 reduced below 'minFactor' of 0.000976562

I haven't a clue as to where the first error is coming from. The second, however, I believe was occurring because I did not pick suitable starting values for B and C.

I was hoping that someone could help me figure out where the first error was coming from. Also, what is the most effective way to pick starting values to avoid the second error?

Thanks!


Solution

  • As noted your problem is most likely the starting values. There are two strategies you could use:

    1. Use brute force to find starting values. See package nls2 for a function to do this.
    2. Try to get a sensible guess for starting values. Depending on your values it could be possible to linearize the model.

    G = (1 / (1 + (B^2)*(R^2))^((C+1)/2))

    ln(G)=-(C+1)/2*ln(B^2*R^2+1)

    If B^2*R^2 is large, this becomes approx. ln(G) = -(C+1)*(ln(B)+ln(R)), which is linear.

    If B^2*R^2 is close to 1, it is approx. ln(G) = -(C+1)/2*ln(2), which is constant.

    (Please check for errors, it was late last night due to the soccer game.)

    Edit after additional information has been provided: The data looks like it follows a cumulative distribution function. If it quacks like a duck, it most likely is a duck. And in fact ?Gest states that a CDF is estimated.

    library(spatstat)
    data(simdat)
    simdat.Gest <- Gest(simdat)
    Gvalues <- simdat.Gest$rs
    Rvalues <- simdat.Gest$r
    plot(Gvalues~Rvalues)
    
    #let's try the normal CDF
    fit <- nls(Gvalues~pnorm(Rvalues,mean,sd),start=list(mean=0.4,sd=0.2))
    summary(fit)
    lines(Rvalues,predict(fit))
    #Looks not bad. There might be a better model, but not the one provided in the question.